Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to read attributes when the size of attribute lists is relatively big #41

Open
jduan8109 opened this issue Feb 27, 2018 · 10 comments · May be fixed by #68
Open

unable to read attributes when the size of attribute lists is relatively big #41

jduan8109 opened this issue Feb 27, 2018 · 10 comments · May be fixed by #68

Comments

@jduan8109
Copy link

No description provided.

@jjhelmus
Copy link
Owner

@jduan8109 Can you provide additional details or an example of this behavior? Either would make debugging the issue much easier.

@jduan8109
Copy link
Author

@jjhelmus , thanks. I am trying to load a netcdf4 file in with your code. When i have a short list of attributes in the data like this:
//group attributes
:parent_grid_name = "GERMANY" ;
:number_of_subgrids = "0" ;
:minimum_latitude = "50.85" ;
:maximum_latitude = "51.7" ;
:left_longitude = "12.08333333333333" ;
:right_longitude = "13.5" ;
I have no problem getting all the attributes via calling "xxx.attrs", however, if there is a long list of attributes in the file like below, using "xxx.attrs" will return an empty dict instead.

//group attributes
:parent_grid_name = "GERMANY" ;
:number_of_subgrids = "0" ;
:minimum_latitude = "50.85" ;
:maximum_latitude = "51.7" ;
:left_longitude = "12.08333333333333" ;
:right_longitude = "13.5" ;
:north_south_node_interval = "0.001666666666666667" ;
:east_west_node_interval = "0.08333333333333333" ;
:north_south_node_sequence = "south_to_north" ;
:east_west_node_sequence = "west_to_east" ;
:array_node_order_arrangement_method = "row_major" ;
:number_of_rows = "766" ;
:number_of_columns = "851" ;
:recommended_interpolation_method = "bilinear" ;
:dataset_date_time = "13-01-03" ;
:dataset_begin_date = "13-01-03" ;
:dataset_end_date = "NONE" ;
:dataset_epoch = "2013.163" ;

@bnlawrence
Copy link
Collaborator

@jduan8109 I don't think I've seen this, and I'm in the process of providing some improvements to pyfive. If you're still interested, and it's still a problem, could I trouble you to create a little netcdf file with just the attributes which cause the problem, and upload it here?

@bnlawrence
Copy link
Collaborator

@jduan8109 we don't need more details! One of my colleagues has just exposed what I now believe to be the same issue (NCAS-CMS#23). While he found it using our fork, I've tested it with the current master from here, and his 'B file" also returns an empty attribute set ...

His original file is here, the h5dump of that file is here: h5B.txt

@bnlawrence
Copy link
Collaborator

bnlawrence commented Jan 6, 2025

Ok, I think I can see where the problem comes from:

 def get_attributes(self):
        """ Return a dictionary of all attributes. """
        attrs = {}
        attr_msgs = self.find_msg_type(ATTRIBUTE_MSG_TYPE)
        for msg in attr_msgs:
            offset = msg['offset_to_message']
            name, value = self.unpack_attribute(offset)
            attrs[name] = value
        # TODO attributes may also be stored in objects reference in the
        # Attribute Info Message (0x0015, 21).
        return attrs

and there is definitely a message of type 21 in the offending file (ATTRIBUTE_MSG_TYPE is otherwise expected to be 0x000C aka 12).

@bnlawrence
Copy link
Collaborator

bnlawrence commented Jan 6, 2025

From the docs:

Name: Attribute Info Message
Header Message Type: 0x0015
Length: varies
Status: Optional, may not be repeated.

Description:This message stores information about the attributes on an object, such as the maximum creation index for the attributes created and the location of the attribute storage when the attributes are stored "densely".

Format of Data:

Screenshot 2025-01-06 at 19 43 10

@bnlawrence
Copy link
Collaborator

@jjhelmus I realise you're not wanting to do anything here, but I wondered if you could confirm my supposition here of what is going on here (I've not needed to get down to this level before):

It looks to me like when there are a lot of attributes they are going to stored with each attribute stored in "the fractal heap" (so I need to find out where that is, and how things are stored there), and the address of the object in that heap will be stored using a regular b-tree. I am thinking I should be able to find the size and length of each object from the b-tree (which will hopefully be accessible using your existing b-tree code), and that somewhere there will be a table that allows me to work out how to decode the objects stored in that b-tree (or the objects will just be regular attribute messages).

@bnlawrence
Copy link
Collaborator

I should have googled this before. I see that @bmaranville and @woutdenolf
were looking at this bit of the spec too. If either of you can help me understand what to do next, I'd be grateful!

@bnlawrence
Copy link
Collaborator

(never mind, I've found my way into it, am currently in the fractal heap trying to debug why it's not working, I've already got a "new" BTree subclass for these things ...)

@bnlawrence
Copy link
Collaborator

I have a test and fixes which now pass in my h5netcdf branch which should be incoming (the critical path to delivering you a pull request is NCAS-CMS#21).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants