You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Integration Work Package 2: Implementation and deployment
Overview
This work package will implement the integration of different modules with each other within DaSSCo’s infrastructure, including ARS, Ingestion Server, and Refinery (Slurm). To integrate these systems, a service-based integration module should be built. The integration module will be able to receive assets from the ingestion server and store it in the ARS along with their metadata. It will also coordinate with the Refinery (Slurm cluster) to receive asset metadata updates and new assets after image processing. It will also keep track of all the assets in transit, provide bookkeeping, and provide logs to IDS and IPS modules. It will receive updates from the Specify bridge to coordinate the data assimilation from ARS to Specify. It will integrate with Keycloak. IDP for authentication and access management. This work will be split up into several iterations.
Description
The Integration module is responsible for receiving the assets from the Ingestion Server. This includes being able to receive authentication and authorisation tokens from the IDP in preparation for incoming requests from the Ingestion Server. These requests would then deliver data comprising the asset (image file) and the JSON file containing the metadata.
On arrival, it creates the assets using the ARS API and submits the asset metadata. The integration module should have an authorised session running with the ARS and use it for creating the assets. Upon creation, it receives confirmation and a link that should be used to upload the asset. It should also be able to create/upload several assets simultaneously.
The Integration module should keep track of all the incoming and outgoing assets and log their statuses (failure/success). In case of failure, it should be able to redirect the assets to temporary storage. It should also be able to recover the assets from the temporary storage and move them to the ARS when the issue is resolved. It should also be able to send a confirmation to the Ingestion Server, when the assets have been stored in the ARS
The integration module should then, based on the log, initiate jobs at the Refinery. This Refinery is a Slurm cluster. It can receive SSH commands for starting/stopping jobs, and provide information about jobs. Scripts are being developed, that will run of Slurm Cluster, which can send requests to an API for communications. These scripts are part of image processing pipelines. They receive the actual asset and its metadata to perform tasks like barcode reading, OCR, cropping, produce low-res derivatives, etc. Some scripts run in sequence and some in parallel. The Integration module should be able to start these scripts for each asset, receive the results and persist them in the ARS. It should also be able to track the jobs started on the Slurm cluster for failures and recover from them too. Upon creation of low-res asset(derivatives), by the scripts running on the Refinery, the Integration Server should be able to receive the metadata of the low-res asset and send it to the ARS. It should then provide the script, a HTTP link for uploading the low-res asset.
After the asset has been processed, the Integration module should initiate the sync to ERDA API call in the ARS and monitor ARS for any failures.
Estimated consultancy hours: 0 hours
Estimated internal hours (primarily LHD):
Estimated start date: 01/02/2024
Estimated end date: 30/06/2025* (guessed, need more information here)
Extended requirements:
Github board:
The text was updated successfully, but these errors were encountered:
Integration Work Package 2: Implementation and deployment
Overview
This work package will implement the integration of different modules with each other within DaSSCo’s infrastructure, including ARS, Ingestion Server, and Refinery (Slurm). To integrate these systems, a service-based integration module should be built. The integration module will be able to receive assets from the ingestion server and store it in the ARS along with their metadata. It will also coordinate with the Refinery (Slurm cluster) to receive asset metadata updates and new assets after image processing. It will also keep track of all the assets in transit, provide bookkeeping, and provide logs to IDS and IPS modules. It will receive updates from the Specify bridge to coordinate the data assimilation from ARS to Specify. It will integrate with Keycloak. IDP for authentication and access management. This work will be split up into several iterations.
Description
The Integration module is responsible for receiving the assets from the Ingestion Server. This includes being able to receive authentication and authorisation tokens from the IDP in preparation for incoming requests from the Ingestion Server. These requests would then deliver data comprising the asset (image file) and the JSON file containing the metadata.
On arrival, it creates the assets using the ARS API and submits the asset metadata. The integration module should have an authorised session running with the ARS and use it for creating the assets. Upon creation, it receives confirmation and a link that should be used to upload the asset. It should also be able to create/upload several assets simultaneously.
The Integration module should keep track of all the incoming and outgoing assets and log their statuses (failure/success). In case of failure, it should be able to redirect the assets to temporary storage. It should also be able to recover the assets from the temporary storage and move them to the ARS when the issue is resolved. It should also be able to send a confirmation to the Ingestion Server, when the assets have been stored in the ARS
The integration module should then, based on the log, initiate jobs at the Refinery. This Refinery is a Slurm cluster. It can receive SSH commands for starting/stopping jobs, and provide information about jobs. Scripts are being developed, that will run of Slurm Cluster, which can send requests to an API for communications. These scripts are part of image processing pipelines. They receive the actual asset and its metadata to perform tasks like barcode reading, OCR, cropping, produce low-res derivatives, etc. Some scripts run in sequence and some in parallel. The Integration module should be able to start these scripts for each asset, receive the results and persist them in the ARS. It should also be able to track the jobs started on the Slurm cluster for failures and recover from them too. Upon creation of low-res asset(derivatives), by the scripts running on the Refinery, the Integration Server should be able to receive the metadata of the low-res asset and send it to the ARS. It should then provide the script, a HTTP link for uploading the low-res asset.
After the asset has been processed, the Integration module should initiate the sync to ERDA API call in the ARS and monitor ARS for any failures.
Estimated consultancy hours: 0 hours
Estimated internal hours (primarily LHD):
Estimated start date: 01/02/2024
Estimated end date: 30/06/2025* (guessed, need more information here)
Extended requirements:
Github board:
The text was updated successfully, but these errors were encountered: