Best practices

Version and Tag 🥇

The version management for the evolution of the processing chain must comply with semver.org. The version tag must be applied to a development branch containing the algorithm, the Docker image, and all standard rules described in the attached document.

Digit Description Definition / Example
First Digit Major Version There are non-backward compatible changes. A set of minor changes and fixes.
Second Digit Minor Version There are additions of backward-compatible features.
Third Digit Patch Version There are backward-compatible anomaly corrections.

Configuration Management

Readme File 🥇

A Readme written in Markdown at the root of the GIT project must allow for a quick overview of the chain.

  • Documentation Link: Link to the documentation hosted on readthedocs.com.
  • Title and Badge: Project name and indicators (build status, version, license).
  • Short Description (Pitch): One to two sentences maximum on the "what" and the "why".
  • Quick Installation: A single command line (e.g., npm install or pip install). 🥈

GIT Workflow 🥈

The project should follow the Git Flow workflow: https://git-flow.readthedocs.io/fr/latest/presentation.html


Packaging

Image Format 🥇

Images must be in OCI container format (Docker type).

CWL Description of the Process 🥇

A CWL file containing a minimal description of your process must be provided, including the following sections:

  • Input Data
  • Processing
  • Output Data

Documentation 🥇

Documentation must be managed within the GIT project and made publicly available (e.g., Readthedocs, Jupyterbook, Sphinx). A template is provided to help you initialize your documentation here:

Documentation Content

  • Process Description Template (v3.0): Standard template for describing a processing chain.
  • Dataflow Document: A document outlining the data flow.

Industrialization

Automated Tests 🥈

  • The chain must include at least one E2E (End-to-End) test validating the configuration and the use of peripheral data with the delivered Docker image.
  • Pay close attention to the test scope to cover scale-up use cases involving large data volumes.

Data Management 🥈

  • Acquisition of processed data and production must be performed using an S3 Bucket.
  • Data flows auxiliary to acquisition and production should be as limited as possible and take minimum time.
  • Prioritize access via HTTPS calls, including for configurations, using a GIT server or an S3 bucket.

Logging Strategy 🥉

Implement a logging strategy that allows for effective monitoring and troubleshooting of the processing chain.

  • Production mode: Log only INFO level unless a real malfunction occurs.
  • Debug mode: A more verbose mode for troubleshooting.
  • Master the logs of the COTs and frameworks used.
  • Use different log levels (0, 1, 2, 3, 4).

Optimization and Scalability Recommendations 🥇

  • Optimize time spent on I/O during processing.
  • Maximize CPU utilization throughout the processing duration.
  • Provide a recommendation document regarding the optimal distribution of the process:
    • Minimum and maximum number of workers.
    • Reserved CPU and RAM per worker.
    • Estimated execution time for an input data chunk.

Interoperability and Standards

STAC Compliance 🥉

  • Provide a STAC Item file as an output of the chain.

Data Format 🥈

  • Provide a QuickLook of the product.
  • Adhere to the ARCO (Analysis-Ready Cloud Optimized) format (to be specified/completed).