-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CMIP7 requirements: "branded variable" and new mip_table specification #762
Comments
Is this what the mip-cmor-tables will look like? Would the removal of "frequency" reduce the number of tables since they are currently grouped by modeling realm and frequency? Are users supposed to select which "branded variables" from a table they are going to use instead of "variable_id"? I assume "region" is going to be like "realm" in global attributes where its valid entries will be found in the CV, correct? Will the "approx_interval" come from the CV or some other table? CMOR currently uses this value for a test. |
The tables will be structured the same as old tables with the changes I enumerated above. But, we can group variables into tables anyway we like (even placing them all into a single table, if we like), and instead of having a total of 2062 table entries (across tables), we’ll have about 1600 (because the same variable sampled at multiple frequencies will be found in only one table). As I understand it, “variable_id” records the “out_name” found in the table, which is also the actual name of the variable array written to the netCDF file. That won’t change. As I noted, the out_name in the new tables will be the root name (i.e., prefix) of the branded variable name (e.g., “tas”, which is the prefix appearing in “tas_tavg-z0-hxy-x”) As for realm, experiment_id, institute_id, etc., the valid regions will be found in a CV (and for CMIP7, there may only be a few options: “global”, “Antarctica”, “Greenland”, and a couple more perhaps. We might decide to turn off the frequency check in CMIP7, which, as you say, is based on "approx_interval". Or we could provide a CV with "frequency" as the key, and the approximate interval as the value. The user would specify the "frequency" in the input table (as described above), and then CMOR would go to the frequency table and extract the approx_interval so it could perform its check. The frequency CV might look like:
|
Please provide feedback and questions on the following. I've prepared an update of my earlier enumeration of possible changes to CMOR. A nicely-formatted version can be found at https://docs.google.com/document/d/1Hyv87wh0BS9dI0hSOydYubrsdpMe23qw3kCj1kLVuSo/edit?tab=t.0 , but i'll copy and paste here: CMOR changes that are needed to handle “branded variables”: Changes needed in user_input file:
Changes needed in CMOR table:
Changes needed in the CMOR code:
Store each of the above as global attributes. Sample new CMOR (or MIP) table:
Note: all variable entries will be similar, but there may be one or two cases where attributes “flag_values” and “flag_meanings” are defined in addition to the above. Implications for data request: If the branded variable names and the new MIP table names are not provided by the data request, then whatever variable labels are provided (e.g., root name and CMIP6 table name) will need to be translated into branded variable names and new MIP table names. This, presumably could be done relying on a look-up table. |
I just noticed that the table entries that had been shown to be deleted in the original google doc lost the "strike through" marks when I copied into this issue. I've now edited the sample CMOR table segment above indicating which entries in the current CMIP table should be deleted. |
I've reviewed #762 (comment) and found it needs to be tweaked. Again, a nicely-formatted version can be found at https://docs.google.com/document/d/1Hyv87wh0BS9dI0hSOydYubrsdpMe23qw3kCj1kLVuSo/edit?tab=t.0 . For CMIP7 we expect to define 8 MIP tables, one for each realm. Here is a sample header and single entry from the "atmos" table.
QUESTIONS ABOUT CMOR (I've asked "yes" or "no" questions, but the real question is "how difficult would it be to make the suggested changes?"):
|
As far as priority for the above, the following are essential for CMIP7: 3, 5, 6, and 10. |
I thought of another approach for addressing 5 and 6 that would not involve modifying existing cmor functions. For item 5, we could require the data provider (user) to call a new cmor function, which we could name "cmor_treat_brand". We would call it right after function "cmor_variable". The only argument of the function would be: var_id = integer returned by cmor_variable identifying the variable of interest The function would
For item 6, after a call to "cmor_variable", we would require the user to call cmor function "cmor_set_variable_attribute" twice:
This is really no different than doing these things inside "cmor_variable", as I suggested in the earlier comment, but this would not modify any of the existing cmor functions. |
@taylor13 Answering your questions from #762 (comment)
When defining the
Yes.
Yes. We can follow a similar approach that CMOR takes with
How about we just have the attribute
We can do what you suggested in #762 (comment) and use
We can make CMOR add the attribute if we want. CMOR will ignore the attribute in the variable's table entry if it is not programmed to find it.
Yes, purely numeric values (i.e. numbers without units) for attributes are stored in netCDF files as floats or integers.
The "basin" variable in the CMIP6_Ofx.json table is the only variable that I know that has the
Yes, we can modify the filename and directory templates to use the
CMOR currently creates a MD5 checksum of the variable table used to write a netCDF file. This checksum is stored in the attribute |
Thanks for clarifying everything. A few follow-up questions/remarks:
CMOR MIP tables are no longer relied on for identification of datasets or files, so their contents can be modified under the same "data specifications", and I don't think it is important that we define a version of the tables. What is essential is to record the entire set of data specifications that govern the metadata in the netCDF files, the templates for constructing paths and filenames, and CVs relied on by those using CMOR. I think the name "dataset_specs_version" is an appropriate name to describe the overarching data specifications, so thought we could just repurpose it. I know in the past data_specs_version would change if the contents of the tables changed, but is there any reason for that now? Maybe others have an opinion about this.
In my example, I use the full branded variable name (root name + branding suffix) as the table entry: "hfss_tavg-u-hxy-u", where "hfss" is the root name (variable_id) and "tavg-u-hxy-u" is the branding suffix. As a service for down-stream users of the data, I wanted to separately store as global attributes these two elements and then parse the suffix and also separately store temporal_label, vertical_label, horizontal_label, and area_label. As for backward compatibility, if you wanted to handle the old CMIP6 tables with this version of the code, you would need to check whether an underscore were found in the variable entry. If not, then you would skip any parsing or storing of the elements. I've probably misunderstood something, so will be interested in whether this seems like a good course or not.
Yes, that is clear. Does the input file determine which attributes CMOR looks for, or is that hardwired inside the code?
|
As long as we don't need to worry about variable names containing From the Naming Conventions section of CF-Conventions:
Are there any cases of variable names with underscores? |
No, for CMIP the only characters allowed in variable names are alphanumeric characters; no punctuation, underscores, or hyphens. |
Chris, from the above, it appears that the changes we're contemplating would not render CMOR unable to process CMIP6 and CMIP6Plus data. That would be great, but is it true? |
(FYI @sashakames, @durack1,@matthew-mizielinski, @wolfiex even though this is primarily for Chris)
It looks likely that some changes to the output requirements for CMIP7 will be agreed shortly and that "branded variables" will be relied on in identifying variables in the cmor output files. It would be good to now consider how this might impact CMOR, so I'll raise this issue now:
How difficult would it be to implement the following?
To implement the above, new CMOR variable tables will need to be generated with the following changes (which could be implemented by someone other than Chris):
I should think most of the above changes to the variable tables should have little impact on the CMOR code itself.
A new CMOR7 table variable entry would include 5 new attributes (the first 5 lines below), and the "frequency" would be removed from the table (in CMIP6 it appeared just before the "long_name" attribute), resulting in the following:
Note that the table_entry has been changed from
"tas"
to the branded variable name:"tas_tavg-z0-hxy-x"
. Also note that the "out_name" will now without exception be just the root name (in this casetas
) appearing before the underscore in the branded variable name. In CMIP6, sometimes the out_name differed from the table_entry.We could elect to have CMOR generate "temporal_type", "vertical_type", "horizontal_type", and "area_type" by parsing the elements comprising the branding_suffix and then looking up in CVs the associated short text descriptions. That would mean these 4 global attributes would not have to be added to the existing tables.
The text was updated successfully, but these errors were encountered: