Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing attributes for File and Variableobjects ("h5netcdf" branch) #23

Closed
davidhassell opened this issue Jan 6, 2025 · 10 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@davidhassell
Copy link
Collaborator

Using the h5netcdf branch, it seems that for some files, but not all, the attributes are not being made available. Here's a reproducer with testA.nc.gz (OK) and testB.nc.gz (not OK):

>>> import pyfive
>>> # ------------------------------------------
>>> # First, a file that works (testA.nc)
>>> # ------------------------------------------
>>> p = pyfive.File('testA.nc', 'r')
>>> p.attrs
{'Conventions': b'CF-1.12',
 '_NCProperties': b'version=2,netcdf=4.9.2,hdf5=1.14.3'}
>>> p['q'].attrs
{'_Netcdf4Coordinates': array([0, 2], dtype=int32),
 'project': b'research',
 'standard_name': b'specific_humidity',
 'units': b'1',
 'coordinates': b'time',
 'cell_methods': b'area: mean',
 'DIMENSION_LIST': array([array([<pyfive.core.Reference object at 0x7f311057b7f0>], dtype=object),
        array([<pyfive.core.Reference object at 0x7f3110578700>], dtype=object)],
       dtype=object)}
>>> # ------------------------------------------------------
>>> # Now for a file that doesn't work (testB.nc)
>>> # ------------------------------------------------------
>>> p = pyfive.File('testB.nc', 'r')
>>> p.attrs
{}  # Expected some attributes
>>> p['tas'].attrs
{}  # Expected some attributes

I'm not sure what salient the different between the two files is - they were created with cfdm.write(f, 'filename') which uses netCDF4-python under the hood.

I was a bit surprised to see the attributes _NCProperties, _Netcdf4Coordinates and DIMENSION_LIST being presented from testA.nc. These are either special to netCDF or are internal to pyfive (?), but in either case shouldn't be there, I think. H5netcdf and netCDF4-python do not return these.

@bnlawrence
Copy link
Collaborator

bnlawrence commented Jan 6, 2025

Just to confirm that I can reproduce these errors. Given the second file has a long list of attributes, this might be related to jjhelmus#41 ... and I only asked a few days ago for an example. The next step will be to see if I can make this happen by creating a file programmatically and testing as we add attributes up to the number we see here. You don't, by any chance, have a python file that creates these two files ab-initio (I fear that you created them by reading big files and just writing a little bit of stuff out ... )?

@bnlawrence
Copy link
Collaborator

And, I've confirmed that vanilla pyfive has this problem too!

@bnlawrence
Copy link
Collaborator

(I should say that the three attributes found in file A should be exposed by HDF5, the fact you don't see them with netcdf APIs indicates that these are things that NetCDF doesn't, by default, expose to the user. You can see them in the h5dumps of the files!)

@bnlawrence bnlawrence self-assigned this Jan 6, 2025
@bnlawrence bnlawrence added the bug Something isn't working label Jan 6, 2025
@bnlawrence
Copy link
Collaborator

@davidhassell Can you please push ahead with the rest of our tests with files which have cut down attribute lists. I am addressing this on the upstream issue, but it may not be a simple and/or quick fix, this will involve going further into the weeds than I have before ...

@davidhassell
Copy link
Collaborator Author

Hi Bryan,

testA.nc was made thus:

>>> import cfdm
>>> a = cfdm.example_field(0)
>>> cfdm.write(a, 'testA.nc')

testB.nc was from a CMIP5 file, read into cfdm, subspaced, and written out with cfdm.write(b, 'testB.nc'). I'll should be able to create it ab initio though ... I'll look into it.

@davidhassell
Copy link
Collaborator Author

(I should say that the three attributes found in file A should be exposed by HDF5, the fact you don't see them with netcdf APIs indicates that these are things that NetCDF doesn't, by default, expose to the user. You can see them in the h5dumps of the files!)

OK - I see. However, they are not exposed by netCDF, so they shouldn't appear from h5netcdf, whichever backend it's using.

>>> import h5py
>>> h0 = h5py.File('test_a.nc')
>>> dict(h0.attrs)
{'Conventions': b'CF-1.12',
 '_NCProperties': b'version=2,netcdf=4.9.3-development,hdf5=1.12.2'}
>>> import h5netcdf  # v1.3.0, so using h5py as backend
>>> h1 = h5netcdf.File('testA.nc')
>>> dict(h1.attrs)
{'Conventions': 'CF-1.12'}  # No "_NCProperties" attribute

@bnlawrence
Copy link
Collaborator

Sure, and they're not exposed by h5netcdf (whatever backend), and they are by h5py and pyfive, who should? Is that not the case? If it is the case that my h5netcdf branch is exposing these attributes with h5netcdf backed by pyfive, and not when backed by h5py, then that is a bug for me on h5netcdf. But you've not shown an example of that?

@davidhassell
Copy link
Collaborator Author

Hi Bryan - I'll check the "h5netcdf driven by pyfive" case ...

@davidhassell
Copy link
Collaborator Author

Doh! You're quite right Bryan. h5netcdf driven by pyfive does the right thing in terms of special attributes. I have been confused in my mind between testing pyfive and h5netcdf :(

So the issue is only that sometimes pyfive doesn't return any attributes when there are some in the file.

Sorry for the confusion.

bnlawrence pushed a commit that referenced this issue Jan 7, 2025
…ng it now in case my laptop dies and I can't repeat this :-)
@bnlawrence
Copy link
Collaborator

All working fine now, with final clean up in 1e2c424

@davidhassell davidhassell changed the title Missing attribute for File and Variableobjects ("h5netcdf" branch) Missing attributes for File and Variableobjects ("h5netcdf" branch) Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants