Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration System: Implementation and deployment v1 #47

Open
PipBrewer opened this issue Jan 7, 2025 · 0 comments
Open

Integration System: Implementation and deployment v1 #47

PipBrewer opened this issue Jan 7, 2025 · 0 comments
Assignees
Labels
Integration Integration Server

Comments

@PipBrewer
Copy link
Contributor

PipBrewer commented Jan 7, 2025

Integration Work Package 2: Implementation and deployment

Overview
This work package will implement the integration of different modules with each other within DaSSCo’s infrastructure, including ARS, Ingestion Server, and Refinery (Slurm). To integrate these systems, a service-based integration module should be built. The integration module will be able to receive assets from the ingestion server and store it in the ARS along with their metadata. It will also coordinate with the Refinery (Slurm cluster) to receive asset metadata updates and new assets after image processing. It will also keep track of all the assets in transit, provide bookkeeping, and provide logs to IDS and IPS modules. It will receive updates from the Specify bridge to coordinate the data assimilation from ARS to Specify. It will integrate with Keycloak. IDP for authentication and access management. This work will be split up into several iterations.

Description
The Integration module is responsible for receiving the assets from the Ingestion Server. This includes being able to receive authentication and authorisation tokens from the IDP in preparation for incoming requests from the Ingestion Server. These requests would then deliver data comprising the asset (image file) and the JSON file containing the metadata.
On arrival, it creates the assets using the ARS API and submits the asset metadata. The integration module should have an authorised session running with the ARS and use it for creating the assets. Upon creation, it receives confirmation and a link that should be used to upload the asset. It should also be able to create/upload several assets simultaneously.
The Integration module should keep track of all the incoming and outgoing assets and log their statuses (failure/success). In case of failure, it should be able to redirect the assets to temporary storage. It should also be able to recover the assets from the temporary storage and move them to the ARS when the issue is resolved. It should also be able to send a confirmation to the Ingestion Server, when the assets have been stored in the ARS

The integration module should then, based on the log, initiate jobs at the Refinery. This Refinery is a Slurm cluster. It can receive SSH commands for starting/stopping jobs, and provide information about jobs. Scripts are being developed, that will run of Slurm Cluster, which can send requests to an API for communications. These scripts are part of image processing pipelines. They receive the actual asset and its metadata to perform tasks like barcode reading, OCR, cropping, produce low-res derivatives, etc. Some scripts run in sequence and some in parallel. The Integration module should be able to start these scripts for each asset, receive the results and persist them in the ARS. It should also be able to track the jobs started on the Slurm cluster for failures and recover from them too. Upon creation of low-res asset(derivatives), by the scripts running on the Refinery, the Integration Server should be able to receive the metadata of the low-res asset and send it to the ARS. It should then provide the script, a HTTP link for uploading the low-res asset.

After the asset has been processed, the Integration module should initiate the sync to ERDA API call in the ARS and monitor ARS for any failures.

Estimated consultancy hours: 0 hours
Estimated internal hours (primarily LHD):
Estimated start date: 01/02/2024
Estimated end date: 30/06/2025* (guessed, need more information here)

Extended requirements:
Github board:

@PipBrewer PipBrewer added the Integration Integration Server label Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Integration Integration Server
Projects
None yet
Development

No branches or pull requests

2 participants