- Communication: Space NFDI4BIOIMAGE-HACKATHON2024 (join via https://matrix.to/#/!OPPIswBjILObXJmMLI:mpg.de?via=gitter.im)
- OMERO instance:
- upload instance 10.14.28.137 with 2TB storage
- developer instance 10.14.28.177 with 250GB storage
- Requires VPN. Requires some setup beforehand.
- information will follow
- Subproject: Optimize RDF structure and URIs, packaging in subsets
- Subproject: Use Cases for subsetting RDF
- Subproject: Accessing RDF subsets
Reviewing RDF Structure and URIs: Analyze and refine the current RDF structure and URIs to ensure they are optimized for production environments
Packaging subsets: Package subsets of RDF data in a RO-Crate format to foster sharing and data integration.
- Josh Moore (lead)
- Susanne Kunis (lead)
- Niraj Kandpal
- Peter Zentis
- A refined RDF structure in terms of used predicates and improved URIs ready for production deployment.
- Repositories: omero-rdf, ome-ld, idr_metadata_model
- Login to the IDR
- necessary python modules: omero-rdf
- idr0054 (with zarr) https://idr.openmicroscopy.org/webclient/?show=project-701
- idr0056 https://idr.openmicroscopy.org/webclient/?show=screen-2303
- Create a Demo dataset with examples eg: finding annotations from all/multiple kind.
- Identify fields that need discussion (Perhaps before hackathon)
- Loop through those fields. Discuss URI options and decide.
- Update omero-rdf to split out the URI
- Document the choice in the appropriate LinkML
- Schema specification for harmonization of OMERO metadata by omero-rdf
- Create test samples of data subsets from IDR based on defined criteria's
- Stretch goals (breakouts)
- Discuss relationship to ome-owl
- Make all URIs resolvable
- flexible customization and extraction of desired subsets of data by omero-rdf
- addon: import of/from ro-crate in OMERO
- Total number of agreed on URIs
- status of specification
- subset in ro-crate format
Subsetting RDF for Various Use Cases: Identify effective strategies to create subsets of the RDF data that cater to different research and application needs.
- Tom Boissonnet
- Torsten Stöter
- Clearly defined guidelines and criteria to create useful subsets of RDF data, tailored for various research and application needs.
- usecases for subsets, query about the outcome/specification of P1
- sparql/SHEX,
- ontop
- data: IDR, OpenNeuro BIDS
- Prototypical SPARQL queries and/or SHEX (?)
Endpoint Drafting and Testing: Develop endpoints for SPARQL and bioschemas formats, aiming to facilitate the consumption of RDF subsets. Curate (meta)datasets and queries for testing and performance profiling. Conduct performance and scalability testing of these endpoints.
- Carsten Fortmann-Grote (lead)
- Mariana Meireles (lead)
- Development and documentation of SPARQL and bioschemas endpoints that facilitate efficient RDF data consumption.
- Curation of metadatasets and queries for testing and performance profiling
- Benchmarks that document the performance of the newly developed endpoints, ensuring the use of the most efficient databases.
- Performance and scalability results that guide future optimizations
- Data: One Project/Dataset/Image hierarchy with metadata (e.g. from IDR). Have to experiment pre-hackathon what is a good size (number of triples)
- github.com:German-BioImaging/omero-ontop-mappings
- ontop sparql endpoint and r2rml engine ("dynamic triplestore")
- other sparql endpoints: fuseki, virtuoso, ??? ("static triplestores")
- Populate static triplestores
- Define performance metrics
- Define benchmark queries
- Run benchmarks, collect data
- Mari: Besides queries, I think it'd be important to clarify whether we're using specific tools or libraries to run benchmarks.
- Analyse data
- Report
- minimum success: - one benchmark query timed and profiled for two triplestore/endpoints
- maybe: description/guide HOW to benchmark