Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPTIMADE JSON Lines specification appendix #531

Open
wants to merge 12 commits into
base: develop
Choose a base branch
from

Conversation

ml-evs
Copy link
Member

@ml-evs ml-evs commented Jul 19, 2024

This PR begins the work on #471, by defining the conventions on top of JSON Lines for deserializing an entire OPTIMADE API into a file.

optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
@ml-evs ml-evs changed the title [WIP] OPTIMADE JSON Lines specification appendix OPTIMADE JSON Lines specification appendix Jul 29, 2024
Fix code block formatting

Update optimade.rst

Apply suggestions from code review

Apply suggestions from code review

Try to fix formatting again

Fix formatting and add note about provider fields
@ml-evs ml-evs force-pushed the ml-evs/jsonlines-draft branch from 6929f10 to c5f1f53 Compare July 29, 2024 20:51
@ml-evs ml-evs force-pushed the ml-evs/jsonlines-draft branch from 96b0231 to 78e158c Compare July 29, 2024 21:09
@ml-evs ml-evs marked this pull request as ready for review July 30, 2024 16:55
@ml-evs ml-evs requested a review from eimrek July 30, 2024 16:55
Copy link
Member

@eimrek eimrek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ml-evs, and sorry for the delays. Now had some time to take a look. I like the proposal and it can be used for optimade-maker as well. But I wrote some comments for consideration.

optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
Co-authored-by: Kristjan Eimre <eimrek@users.noreply.github.com>
optimade.rst Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
ml-evs and others added 2 commits January 23, 2025 17:34
@ml-evs
Copy link
Member Author

ml-evs commented Jan 23, 2025

CI is blocked by #536, but perhaps you could take another look when you get a chance @eimrek? I'll invite some other reviewers too.

@ml-evs ml-evs requested review from eimrek, rartino and merkys January 23, 2025 18:04
Copy link
Contributor

@rartino rartino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good; and as we said during the web meet, very important extension.

However, how are relationships encoded? Maybe it would be nice if the example included a structure that has a relationship to a reference so one can see explicitly how that is encoded?

optimade.rst Outdated Show resolved Hide resolved
@ml-evs ml-evs requested a review from rartino February 3, 2025 13:59
rartino
rartino previously approved these changes Feb 3, 2025
optimade.rst Outdated Show resolved Hide resolved


This JSONL format can also be used to share provider-specific properties.
These should be consistent with any external definitions, and where appropriate, prefixes tied to the tools used to generate the file should be used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not quite get the "prefixes tied to the tools used to generate the file should be used" part. Are there any examples for such cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah, this has lost its meaning a bit. The intention was that a tool like aiida or ASE could support OPTIMADE JSONL as an export format, and any "tool-specific definitions" like _aiida_node_id (or whatever) should be defined and use the tool prefix. I wouldn't be against removing this sentence as the format just follows the normal OPTIMADE rules about defining properties.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, thanks for the explanation. Since "MUST" is not used in this paragraph, I gather its main purpose is to recommend best practice. We may probably want that all the OPTIMADE JSONL files bear some sort of producer identification, but we probably do not want to standardize it at this point. But maybe we can reuse meta.implementation dictionary from the main specification?

Maybe it is worth stressing that OPTIMADE JSONL files have the same requirements ("MUST/SHOULD/MAY") as OPTIMADE responses, in terms of custom properties having prefixes and metadata descriptions. But maybe it is enough just to say that?

Also, what about sharing provider-specific entry types? If we mention provider-specific properties, should be allow entry types as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding non-optimade-spec properties, doesn't the spec say they must have either

  • the provider specific prefix; or
  • a defininition-provider-specific prefix?

I think we can just point to section 3.5 here, no?

Copy link
Member

@merkys merkys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me either. The part on "prefixes tied to the tools used to generate the file" is unclear to me, otherwise happy to merge this.

Co-authored-by: Andrius Merkys <andrius.merkys@gmail.com>
@eimrek
Copy link
Member

eimrek commented Feb 5, 2025

An additional question about custom prefixes, that is (still) not clear to me, is what should happen when provider1 creates a JSONL file, and provider2 subsequently serves this.

Should all _provider1 prefixes be replaced by _provider2?

or would _provider1 prefixes be retained and this would fall under 3.5.2?

Copy link
Member

@eimrek eimrek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, Matthew, for working on this. Left some comments.


- ``api_version``: The OPTIMADE API version used when generating the file, as described in the ``meta`` member in `JSON Response Schema: Common Fields`_.

- The next line MAY contain a standard OPTIMADE ``meta`` object, following the same rules described in `JSON Response Schema: Common Fields`_, where every MUST and SHOULD rule can be reinterpreted as a MAY rule.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we specify what meaning fields like meta.data_returned take in this context?

- First the base info response MUST be provided, following the description at `Base Info Endpoint`_.
- The next lines MUST contain the entry info endpoint responses for the all entry types present later in the file, as described in `Entry Listing Info Endpoints`_.
- The remaining lines of the file contain data entries themselves, described in `Entry Listing JSON Response Schema`_.
These entries can be provided in any order.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of the data entries is not clear to me. Below, we say <entries block ordered by entry type>, while here they can be in any order. I don't have strong feelings about this, but it should be clear. If we want to mandate block-order (e.g. all structures first, then all references, etc), i guess we should say something about both, 1) the order of the blocks (should they match the info endpoint order?); 2) the order of the entries within the blocks.

My own gut feeling is that perhaps the order of blocks is not important, but maybe it's good to have one type of entries all together in a block.

These should be consistent with any external definitions, and where appropriate, prefixes tied to the tools used to generate the file should be used.
It is RECOMMENDED that custom properties are defined in full within the JSONL file, or pointed to a specific versioned property definition.

Example OPTIMADE JSON Lines File
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Example OPTIMADE JSON Lines File
Example file

Considering there is now two OPTIMADE JSON Lines File standards (other one being 9.4), maybe better to be more vague here (e.g. i already mixed these up when looking through the table of contents).


- The next line MAY contain a standard OPTIMADE ``meta`` object, following the same rules described in `JSON Response Schema: Common Fields`_, where every MUST and SHOULD rule can be reinterpreted as a MAY rule.
- The next block of lines provides the ``info`` endpoint responses.
- First the base info response MUST be provided, following the description at `Base Info Endpoint`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sublist is not rendering correctly to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants