NXmpes refactoring #52

domna · 2023-08-09T16:42:49Z

This PR is dedicated to collect changes from the mpes workshop in may and to stimulate discussions on the base class changes introduced by NXmpes.

The comments from the collaborators can be found at a slightly older fairmat proposal page.

Associated PRs

~~Changes from mpes workshop #32 (Changes have been added here)~~
Add sub-application definition NXmpes_liquid #104
NXxps sub app-def #30 (needs substantial changes once this PR has been finalized)
Add new application definition for PEEM experiments #65
~~Change NXsample in NXmpes using the new base classes, introduce NXactuator #68~~ (merged)
Refactoring of NXmpes after 2nd workshop #74
~~Remove sample history from MPES #82~~ (closed)
~~Add list of physical quantities as axis names to NXdata #92~~ (merged)
~~Use NXresolution in MPES #124~~ (merged)
~~Extend energy scan mode in NXenergydispersion #127~~ (merged)
Draft for NXmpes_arpes application definition #128

Transformations

There was already a longer discussion on this during the design of the NXmpes_liquid in #27. The main question is how we denote actual geometries inside the transformations, i.e., should there be vectors which collect a certain axis of, e.g., a HEA.
Also there should be documentations (especially examples/how-tos/tutorials) how to build a transformation chain for template experiments. There is also the new documentation from the NIAC https://hdf5.gitlab-pages.esrf.fr/nexus/nxtransformations_active/classes/base_classes/NXtransformations.html which is much better at explaining how this transformations actually work. For visualization there is also https://github.com/domna/nexus3d.
There are also advances form @mkuehbach for representing coordinate systems in a cleaner way.

Another question to stimulate discussion: Should there be one central place to collect transformations or should they stay attached to the instruments?

Users

There are some comments that the user we have actually violates the GDPR and its questionable whether we can just collect this information.

Alignment with ISO Standards

During the workshop there were some discussions about already existing iso standards related to mpes (there are some comments in #32 https://www.iso.org/obp/ui/#iso:std:iso:18115:-1:ed-3:v1:en), i.e., standardised experimental names. There is also the PaNET ontology and nomad will probably start using KRISS (which is more related to organising physical quantities, e.g., from analysis. But it may be worth to take a look there, too).

Experimental plans

During the workshop there was the idea of an experimental plan, e.g., a description of the measurement routine which describes which parameters were scanned against what. It would give users a top-level approach to the data without having to search through various instruments (which are a rather specific approach to the metadata) and could be used in conjunction with NXdata or is a more sophisticated NXdata entry. I think somewhat related is a general structure of nexus files which should be the same throughout different appdefs and could also contain an experimental plan.

Multiple base classes in one nexus file

During the discussions for sub-appdefs we felt that it would be convenient in having a trait-like approach to select certain aspects of an experiment, e.g., having an NXmpes_timeresolved which can be used together in a file with an NXmpes_arpes to describe a timeresolve arpes experiment without creating a NXmpes_arpes_timeresolved sub-def. This could be simply denoted by an array in the definition field of NXentry using similar inheritance rules as traits (sometimes also known as interfaces) in programming languages, i.e., if there are duplicate fields the first array entry has precedence. We could also define appdefs with collisions incompatible or create a separate namespace for each definition. This can also be used for analysis, e.g., additional analysed fields can just be demanded + it can add interfacing for compatibility with other tools, e.g., we could add a NXnomad trait which builds the results in accordance to KRISS to denote searchable fields.

Angle and k-resolved appdefs

For many companies it would be helpful to be able to write an angle based appdef directly and to have conversion routines to k-space based appdefs. Contrary, a TOF detector could write k-space definitions directly. How to properly represent the different aspects?

Alignment with sub-methods

There is already a development at ALBA and Diamond for NXpeem. They want to build it as a sub-def for NXmpes. We should also include them in the discussion of the general design.
There is also the XPS sub-def in NXxps sub app-def #30 and liqudjet in NXmpes liquidjet draft #27.
A specific ARPES sub-def was started by @Tommaso-Pincelli

Alignment with analysis software

We are currently benchmarking pyArpes (see https://github.com/FAIRmat-Experimental/arpes / https://github.com/chstan/arpes) as an analysis tool. It is unclear what its current maintenance status is (last commit may 2022). However, we should think about how we can map the appdef data into a general data structure as the one used by pyarpes, especially for angle-based appdefs.

Legacy data management and versioning

How do we deal which a history of instrument calibration? Do we only give the latest? Is there some sort of quality factor depending on how much time has passed since the last calibration.

Another point is versioning, in particular for the definitions. As the definitions are now a python package, we should try to extract the version from there (with setuptools-scm as we do for pynxtools). However, there are still some open questions, e.g., how do we deal with different version numbers between fairmat and niac? Do we just use distance + hash notation of setuptools-scm? Certainly, the repository should also be collected in the file. Do we want to have an official fairmat url which points to our nexus definitions repo? So we could in principle move it to somewhere else without breaking older files?

This is just what I can recall from the top of my head. Please feel free to extend and discuss here.

@Tommaso-Pincelli @rettigl @sanbrock @lukaspie @stthuermer

mkuehbach · 2023-08-10T11:22:36Z

@domna this is a good summary for mpes, well done, please find a summary of the current discussions to move forward with NXcoordinate_system(_set): aa0cedc

lukaspie · 2023-08-14T10:02:21Z

@domna thanks for the summary, this is a very useful starting point to coordinate the discussion on NXmpes and related sub-methods. Below, I have added some comments (including the discussion during the workshop).

Transformations

In NeXus, the beam axis is the z-axis by default and every other elements is described with respect to the beam.
In traditional photoelectron spectroscopy, usually the starting point is the sample surface.
- need to also consider scanning beams, slight misalignments by gasket placements, …
Regarding " Should there be one central place to collect transformations or should they stay attached to the instruments?", my suggestion would be to keep them attached to the instrument and then have a summary along the lines of NXcoordinate_system(_set) with links to the individual transformations.

Alignment with ISO Standards

The relevant ISO committee is ISO Technical Commitee 201: (Surface Chemical Analysis)
For the vocabulary, there is an existing naming convention: ISO 18115. We should also check out similar naming schemes provided by NPL, NIST, ... This is particularly relevant for describing established methods (XPS, ARPES, …)
For reproducibility: look at minimum standards of reporting guidelines

Experimental plans

I fully agree that there should a separate description of the experimental plan somewhere in the application definiton (this is relevant for basically any application definition describing a physical measurement). For MPES, this should include the order of measurement, scan modes of the electronanalyzer (could be its own base class), scanning of external parameters, and so on. While it was discussed that it is probably not desirable to have NXmpes as the default software solution for these instruments (including automation of the devices), the measurement process and the order for operations should be reflected in the file as well, e.g. by something like NXprocedure (not existing yet).

Multiple base classes in one nexus file

For me, this is quite important going forward. While probably not originally intended by NIAC, it seems ot me that @domna's proposal (i.e., allowing for multiple bases classes in the definition field of NXentry) is the most natural solution. This will become more important the more we specialize the application definitions. Other than the use cases outlined above, this could also be extended to sample preparation, sample environment, data analysis workflows, or pump-probe experiments.

Alignment with analysis software

For XPS, most of the relevant data analysis software solutions are listed here. Additionally, one could consider including Quases and COMPRO. The developers of these solutions should be included in the discussion at some point.

Instrument

Energy resolution of the experiment is hard to know and depends on many factors (light source, temperature, sample, disorder). Should be recommended.
Collection column and energy dispersion may not be easy to separate.
NXsource:
- While NXmpes was originally only designed for electron spectroscopy (i.e., photon in - electron out), the collaborators pointed out that sometimes additional data is collected in the same experiment (including EELS, LEIS, AES). Their idea was to allow for more sources in NXmpes, but if we only intend to allow photon in - electron out, we will have to think about similar application definitions for the techniques at the same time, such that all data can be exported (for example, to multiple NXentries).
NXbeam: just use ’photons’, cannot always distinguish ‘x-ray’ and ‘UV’

Calibration

For reproducibility, instrument calibration should be done on a regular basis. NXCalibration should store the method or process of calibration, type of calibration, time stamp of calibration and resulting data transformation (e.g. calibrated axes). In XPS, in order to understand the data quality, it is important to know information about the last calibration, previous detector scans, and the transmission function, among others.
Sample temperature: Need a good system for sample temperature calibration. This could require reusing of previous data, e.g. previous measurement of calibration sample in the beam line.

Data quality and error detection

Timestamp of data: Need to have a field. Does this data come from previous measurements and what is the time unit?
Data quality and error detection: time stamps, data from external providers, data origin, data hash?
Instrument history: calibrations, lifetime, ...
Is it possible to connect or relate between datasets inside the application definition and inside a database? → distributed storage

How to move forward

Since we already have quite a lot of discussion items here, I would suggest to have an internal meeting/workshop of some sort first to discuss all of these issues and come up with an enhanced proposal that we can then present to the wider community in a second step.

mkuehbach · 2023-08-14T10:44:47Z

@lukaspie @domna wrt to your last comment and topics some cents from my side:

Transformations:
In #51 specifically contributed/nyaml/NXem_parsed, *NXem_base_parsed.yaml and *NXcoordinate_system(_set).yaml I added this idea as you were asking for possible examples, being as explicit as possible I think is best and having a mechanism whereby McStas for (legacy) support and other CS can be treated with the same approach is a useful extension for NeXus.

ISO:
Fully agree, having and facing other schema is and will be a reality.
Seeing this strongly at the interface MatWerk / FAIRmat, here I try to play the game as follows:
Take such an ISO glossary, which terms can be mapped on NeXus exactly, which not, what can we learn from this, I guess the more frequently our answer is that the comparison yields "no clue, doesn't work" the weaker our NeXus proposal is and may demand changes. For pynxtools and a converter such a mapping is anyway needed if one plans to write a parser for a data representation which uses such an ISO vocabulary. My experience with EBSD was that to parse really hundreds of datasets you have to invest time as with each parser you not get like ten thousands of simulations like for the NOMAD abinitio story but sometimes only a few dozen well curated datasets and then you face yet another file format and naming conventions. This is exactly why I like the idea about general enough harmonizing formats which look over the rim of the tea cup of ones own lab. The key question is here given the variety of formats within a field what is the fraction of instances using that format and unfortunately for materials engineering there are many formats with often no single format having a high fraction which is a serious barrier for parsing many datasets.

Multiple base classes:
I am not so much convinced that there is really the need for making effectively polymorphism-alike statements, em is also a very combinatorial playground which methods and hardware you combine or not, indeed the early attempt of a single NXem was too naive, with #51 I try to improve it where the design goes more along indeed inheritance single one though and inheritance chains.
NXem appdef, a specialization of a general NXem_base (an all optional base class), internally, data collection events, for each event different methods used, thus NXem_method base classes and specializations NXem_ebsd (i.e. specific EBSD-method processing etc), processed data inside an roi to take into account that I may scan the surface but collect multiple signal EDS, EBSD, EELS, and correlate between these signals, as side note it comes closer and closer to what companies do and how one would write guess what Python base classes for an arbitrary software which can simulate any microscope, by the end of the day an SEM or a TEM is just a specialized microscope for which other combinations of base classes will be needed but no need to change thus the base classes.
I try to think about it like graphs, forget about the nexus "base" jargon, which branches (sub-graphs) would I need and can certain sub-graphs specialize already spelled out graphs, what is challenging so far that when I say I have an ROI I would like this base class to be able to handle 1d, 2d, 3d but for now I would need quirks in nexus or NXroi_oned, _twod, etc but I think this is not elegant if not using inheritance, not only in Python one would about writing NXroi NXroi2(NXroi), NXroi1(NXroi), so far in EM most of this I put in appdefs and this is not a good place because the appdef mixes cardinality and existence constraints with specializations and these are two very different things. People should be able to take advantage of as much available as possible, otherwise they will cook up again their own things.

Data quality etc.
An idea like NXevent_data i.e. optionally time-stampable events could be your friend, for references to existent technology partner entries see #51 NXem_ebsd/calibration/origin and NXem_ebsd/calibration/depends_on

Instrument:
There is an NXem_eels now in #51 could this be useful, in fact the idea with NXem* family was a controlled electron beam used for electron matter interaction, in #51 I even no longer make a distinction between a simulated or a real instrument so I think there is some overlap of EM and MPES which could be worthwhile to explore

Calibration:
For em the idea was that it could be useful to consider a calibration nothing but an own experiment or simulation thus it might be worthwhile to think that the calibration is just another instance of an NXmpes thereby all the field and group prefixed with calibration would become obsolete.
In the case of EBSD I opted for NXem_ebsd/measurement */simulation, */calibration which points to another NXem instance or another entry in that same NXem instance if calibration and measurement is done in the same session and people like to combine it in one file.

domna · 2023-08-14T14:45:48Z

@domna thanks for the summary, this is a very useful starting point to coordinate the discussion on NXmpes and related sub-methods. Below, I have added some comments (including the discussion during the workshop).

Transformations

In NeXus, the beam axis is the z-axis by default and every other elements is described with respect to the beam.

In traditional photoelectron spectroscopy, usually the starting point is the sample surface.

need to also consider scanning beams, slight misalignments by gasket placements, …

Regarding " Should there be one central place to collect transformations or should they stay attached to the instruments?", my suggestion would be to keep them attached to the instrument and then have a summary along the lines of NXcoordinate_system(_set) with links to the individual transformations.

I think its good to work example based here and collected different examples from different communities. We already have an example from the mpes liquid community (by @stthuermer) and one for arpes (the matrices are there and currently @rettigl @Tommaso-Pincelli and I are trying to build an example to convert this into the pyarpes coordinate system. Our idea was to have different examples and try to compile something general from then. @lukaspie Do you have an transformation example for your xps experiment? I think this would be useful as well.

Alignment with ISO Standards

The relevant ISO committee is ISO Technical Commitee 201: (Surface Chemical Analysis)
...

Yeah, I think we should try to buy a copy of this document.

Alignment with analysis software

For XPS, most of the relevant data analysis software solutions are listed here. Additionally, one could consider including Quases and COMPRO. The developers of these solutions should be included in the discussion at some point.

I'm also thinking whether we can build guidelines for the data provenance along the analysis steps (e.g., pyarpes is keeping track in each analysis step). There is already something in this direction with the calibration data in NXprocess.

How to move forward

Since we already have quite a lot of discussion items here, I would suggest to have an internal meeting/workshop of some sort first to discuss all of these issues and come up with an enhanced proposal that we can then present to the wider community in a second step.

Fully agree. We might want to dedicate some days to discuss internally where we take our time to address each of these points individually. Maybe it would be good to have an in-person meeting for this, where we prepare and someone of us moderates through the different points based on the comments here, from the companies and the comments on the proposal page. (Edit: I will be, however, in holidays from thursday until 04.09. - so maybe we should discuss tomorrow in the TF meeting and maybe we can already set a date?)

domna · 2023-08-15T14:04:05Z

Here are some smaller points we need to address in NXmpes:

Add a NXtransformation to NXenergydispersion
.../detector/data as NX_NUMBER and /entry/data as NXdata in compliance with other appdefs

domna · 2023-08-16T06:14:02Z

We want to have a meeting when everyone is back from holidays in mid september. For this there are some points for preparation we then can use in the discussion and which we should assign until then:

Get copy of ISO Standard (ISO 18115-1:2023) (+Ontologies??) and work through it. What could we use from it and how?
Collect and distill comments on fairmat proposal page (Caution: The comments are not on the latest version!! Use the link here). Which points do we need to address from the comments?
Find and talk to someone (MarkusS?) how nomad legally deals with collecting user data. How is it applicable to our case?
Show docstrings from base classes + appdef entries on the appdef page together for the fairmat proposal
Legacy data management: what are current standard procedures?
Try to extract the version tag from the nexus definitions to incorporate it into a file. Should work with setuptools-scm. Automatic definitions versioning for nxs file pynxtools#124 already did something on this but it just takes the version number of pynxtools (because back then nexus_definitions wasn't a python package). In principle one should just need to change the root of the version call to the one of the definitions and it should work

These are not strictly necessary for the discussion but could be very useful (especially the first point) to directly being able to document stuff during/after the meeting

Align how we will document with the nomad documentation and add pages for nexus (e.g. separate or combined docs)
Basic Examples/Tutorial page for NeXus (Important questions to answer: How do appdef names translate into entries in the file, how does the pynxtools template notation work), e.g., I have my experiment here, how do I construct a datafile from it?

Here are some points which are already worked on:

Import nexus data into pyarpes dataframe (Laurenz is already working on it, I will join the effort as soon as I'm back, so this is just collected for completeness)
Transformations examples (mpes liquid + MoTe2 arpes example + XPS example from @lukaspie)

domna · 2023-08-16T10:21:45Z

PR #53 should be merged in here, before we evaluate the base classes!
We should then also decide where we finally put the fields in the base classes during our meeting.

rettigl · 2023-08-16T13:50:35Z

Transformations:
In #51 specifically contributed/nyaml/NXem_parsed, *NXem_base_parsed.yaml and *NXcoordinate_system(_set).yaml I added this idea as you were asking for possible examples, being as explicit as possible I think is best and having a mechanism whereby McStas for (legacy) support and other CS can be treated with the same approach is a useful extension for NeXus.

How does this work together with NXtransformations? I.e. if I want to use this in an NXtransformation chain as a first element to base everything else on, how would I do this?

…mmended

contributed_definitions/NXmpes.nxdl.xml

contributed_definitions/nyaml/NXmpes.yaml

contributed_definitions/NXmpes.nxdl.xml

mkuehbach · 2023-09-19T11:05:28Z

Let's also add an NXpulser which can be used with an instance NXbeam

rettigl

I found a number of things still to fix, see the review comments. It appears also that some base classes (in particular NXsample, NXenvironment, NXcalibration) are not up-to-date with their usage in NXmpes and our discussions.

base_classes/nyaml/NXdata.yaml

contributed_definitions/nyaml/NXactuator.yaml

contributed_definitions/nyaml/NXcalibration.yaml

contributed_definitions/nyaml/NXcollectioncolumn.yaml

contributed_definitions/nyaml/NXdata_mpes.yaml

contributed_definitions/nyaml/NXmpes.yaml

lukaspie · 2024-01-13T13:42:58Z

Can you still add this description?

I don't get what you mean here.

lukaspie · 2024-01-13T14:05:56Z

Regarding that "some base classes (in particular NXsample, NXenvironment, NXcalibration) are not up-to-date with their usage in NXmpes and our discussions", I don't think that every field that is present in NXmpes/NXsample has to be present in the NXsample base class. NXsample contains a generic NXenvironment, so we should be fine there. The same with NXenvironment containing generic NXsensor and NXactuator. IMO, adding e.g bias(NXenvironment) to the NXsample base class just makes it even more cluttered and it should only be specified in the appdef.

The only problem is really if there is an uppercase field in the base class (like NXcalibration/MAPPING(NX_FLOAT) and then in the appdef, we have other NX_FLOAT fields that are not mappings. We can fix that with the suggestions I gave above.

rettigl · 2024-01-13T17:23:22Z

Can you still add this description?

I don't get what you mean here.

In the original discussion about this, you suggested some changes, but they are not implemented yet, it seems.

contributed_definitions/nyaml/NXmpes.yaml

rettigl

Appart from a few comments made directly, this looks good now from my side.

lukaspie · 2024-01-15T11:22:00Z

I just realized that the generic NXactuator was missing from the NXenvironment base class. I had added it initally in #68, but it seems that it got lost during a force-push there. It is now back and the docstrings for both sensor and actuator have been extended in NXenvironment.

lukaspie · 2024-01-15T14:02:34Z

Comments and changes from latest reviews by @mkuehbach and @rettigl are in. There are just two unresolved conversations left:

resolution_errors or resolution_error
another old conversation about center_wavelength in NXsource

Associated PRs have also been merged it, aside from the sub add-defs and #144 (can all be merged later).

domna mentioned this pull request Aug 16, 2023

Mpes readd base classes #53

Merged

sanbrock marked this pull request as draft August 31, 2023 14:04

domna closed this Sep 11, 2023

domna force-pushed the mpes-refactor branch from 38ed45e to 937af09 Compare September 11, 2023 09:49

Sets preparation, temperature+gas_pressure in NXmpes/NXsample to reco…

d45470c

…mmended

domna reopened this Sep 11, 2023

Sets fields to optional

68c9556