Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NXapm, bugs when parsing processed NeXus files that were generated from CompositionSpace (NFDI-MatWerk) or paraprobe-toolbox (FAIRmat) #36

Open
18 of 22 tasks
mkuehbach opened this issue Feb 6, 2025 · 1 comment · May be fixed by #34
Assignees

Comments

@mkuehbach
Copy link
Collaborator

mkuehbach commented Feb 6, 2025

We wish to use the apm_app also to search for processed entries from analysis results obtained with the paraprobe-toolbox and CompositionSpace tools developed by FAIRmat and NFDI-MatWerk in the joint IUC09, there are several bugs when drag-and-dropping processed NeXus files from these analysis tools right now. All tested with usa_denton_smith example:

NOMAD parsing log bugs IUC09/paraprobe (checked means there is a suggestion for a fix)
CompositionSpace:

  • config group not instantiated, bug in CompositionSpace
  • axis_feature_importance not written out to HDF5, bug in CompositionSpace
  • sequence_index in NXprocess not mapped correct numpy.ndtype?, bug in the pynxtools NeXus parser
  • CompositionSpace add specimen section parsed from voxelization, feature addition to be implemented in CompositionSpace, add specimen type: simulation, atom_types the set of atoms

paraprobe-transcoder:

  • config, fine, except for the expected situation that chemical_formula is not found
  • results, fine

paraprobe-ranger:

  • config, fine
  • results, fine

paraprobe-selector:

  • config, same bug on dimensionality as CompositionSpace
  • results, fine

paraprobe-surfacer:

  • config, fine
  • results, two times same bug on dimensionality

paraprobe-distancer:

  • config, fine
  • results, fine

paraprobe-tessellator:

  • config, fine
  • results, two times same bug on dimensionality

paraprobe-spatstat:

  • config, fine
  • results, fine

paraprobe-nanochem:

  • config, ENTRY.delocalization.decomposition this group with childs is not in NXapm_paraprobe_nanochem_config, hence not found by NOMAD, resolve_element concept "method", is not a value of this enumeration, fix in paraprobe-nanochem

  • results, does not parse complete, likely too many concepts, we need to prevent that these get all exposed to the elasticsearch stack, these results files may end up having thousands if not millions of groups !, my suggestion specific configuration for the Nomad NeXus parser controlling the parsing coverage and depth or specific concepts only when spotting NeXus files that follow specific appdefs like NXapm_paraprobe_results_nanochem

  • Interconnect NXapm_paraprobe_results with NXapm_paraprobe_config when spotting that a specific NeXus config file was used and that config file also uploaded akin to sample identifier.

  • NX_UINT enumeration values get resolved properly in data tab, but for search are currently mapped on the incorrect datatype str, but with this these quantities are searchable via a terms widget and e.g. using data.ENTRY.tessellation.voronoi_cells.dimensionality__field#pynxtools.nomad.schema.Root#str

@mkuehbach mkuehbach self-assigned this Feb 6, 2025
@mkuehbach
Copy link
Collaborator Author

@mkuehbach mkuehbach linked a pull request Feb 7, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant