Best practices
Version and Tag 🥇
The version management for the evolution of the processing chain must comply with semver.org. The version tag must be applied to a development branch containing the algorithm, the Docker image, and all standard rules described in the attached document.
| Digit | Description | Definition / Example |
|---|---|---|
| First Digit | Major Version | There are non-backward compatible changes. A set of minor changes and fixes. |
| Second Digit | Minor Version | There are additions of backward-compatible features. |
| Third Digit | Patch Version | There are backward-compatible anomaly corrections. |
Configuration Management
Readme File 🥇
A Readme written in Markdown at the root of the GIT project must allow for a quick overview of the chain.
- Documentation Link: Link to the documentation hosted on readthedocs.com.
- Title and Badge: Project name and indicators (build status, version, license).
- Short Description (Pitch): One to two sentences maximum on the "what" and the "why".
- Quick Installation: A single command line (e.g.,
npm installorpip install). 🥈
GIT Workflow 🥈
The project should follow the Git Flow workflow: https://git-flow.readthedocs.io/fr/latest/presentation.html
Packaging
Image Format 🥇
Images must be in OCI container format (Docker type).
CWL Description of the Process 🥇
A CWL file containing a minimal description of your process must be provided, including the following sections:
- Input Data
- Processing
- Output Data
Documentation 🥇
Documentation must be managed within the GIT project and made publicly available (e.g., Readthedocs, Jupyterbook, Sphinx). A template is provided to help you initialize your documentation here:
Documentation Content
- Process Description Template (v3.0): Standard template for describing a processing chain.
- Dataflow Document: A document outlining the data flow.
Industrialization
Automated Tests 🥈
- The chain must include at least one E2E (End-to-End) test validating the configuration and the use of peripheral data with the delivered Docker image.
- Pay close attention to the test scope to cover scale-up use cases involving large data volumes.
Data Management 🥈
- Acquisition of processed data and production must be performed using an S3 Bucket.
- Data flows auxiliary to acquisition and production should be as limited as possible and take minimum time.
- Prioritize access via HTTPS calls, including for configurations, using a GIT server or an S3 bucket.
Logging Strategy 🥉
Implement a logging strategy that allows for effective monitoring and troubleshooting of the processing chain.
- Production mode: Log only
INFOlevel unless a real malfunction occurs. - Debug mode: A more verbose mode for troubleshooting.
- Master the logs of the COTs and frameworks used.
- Use different log levels (0, 1, 2, 3, 4).
Optimization and Scalability Recommendations 🥇
- Optimize time spent on I/O during processing.
- Maximize CPU utilization throughout the processing duration.
- Provide a recommendation document regarding the optimal distribution of the process:
- Minimum and maximum number of workers.
- Reserved CPU and RAM per worker.
- Estimated execution time for an input data chunk.
Interoperability and Standards
STAC Compliance 🥉
- Provide a STAC Item file as an output of the chain.
Data Format 🥈
- Provide a QuickLook of the product.
- Adhere to the ARCO (Analysis-Ready Cloud Optimized) format (to be specified/completed).