What is a "field"? #406
-
QuestionThe conventions use the term “field” about 15 times, not counting “bit field” in section 3.5 and “field construct” in Appendix I, but it is not a defined term in section 1.3. So what is a field? From context, it is very close to a “data variable” (which is also lacking a definition in section 1.3), but from a data science perspective they are not the same. As a first attempt at relating the two terms more formally: a data variable is a CF construct to capture the data that embodies the field, including its ancillary data such as axes and attributes. In other words, the field is the physical phenomenon whose quantities and properties are captured in a data variable’s data and axes and attributes, respectively. As things are now, I am concerned about the conflation of the two terms and use. The opening paragraph from section 7.3:
The first line contains both terms “field” and “variable” without any obvious connection between the two (for lack of a proper definition of both, also noting that “variable” without any specification is ambiguous). Further on in the paragraph there are terms “data values”, “time means” and “time dimension variable”. We all know what it means when we read it, but it is not very precise or unambiguous. I’d rewrite this paragraph as follows (assuming that both “field” and “data variable” have been defined along the lines of the above; changes in bold):
There are 14 (or so) other references to "field" that might need similar sharpening. Is that something that we should take up? |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 1 reply
-
Dear Patrick, Thanks for these thoughts. I agree that some consistency in this area would be a good thing. When the data model was created, it was thought that some of its language could be used in the main conventions document, but that was a task that was never pursued, probably because it is tricky! One of the reasons for this was that we have to be very careful to not change the meaning with any new phrasing. In your example on cell methods, for instance, "Each ”name: method” pair indicates that for an axis of the data variable identified by name" is not quite right, as the name does not have to be a dimension of the data variable; the new term "computed cell values" is itself ambiguous (at least to me - is a single thermometer measurement "computed"?); the new text removes the explicit connection between cell_methods telling us what the characteristic of the given values is (although it is implied a couple of sentences on), etc. I'm not suggesting that the original text is perfect (for this example or elsewhere), just that if we want to change it we would have to be very careful not to subvert it! Cheers, |
Beta Was this translation helpful? Give feedback.
-
Dear David, I hope I am not coming across as subversive... It is certainly not my intent! Perhaps we could start by defining "data variable" formally, as well as "field" if that term is to remain in the document? I personally like the term "field" as it also used for physical quantities in meteorology and climatology, as well as other sciences of interest to this community. Following that definition we can then go through the conventions and identify where the language may be amended to be more clear and unambiguous. One thing that keeps tripping me up is the use of the term "variable" without further qualification. Perhaps unfortunately, netCDF uses the term for one of its data structures and in the conventions there are now quite a few variants so being explicit about the variant would be useful to the reader who is not as well versed in the conventions text as you are (or even me). Best, |
Beta Was this translation helpful? Give feedback.
-
It's also a term used in database lingo -- a "field" is a single value as part of a "record" -- In that sense, a DB field would be analogous to a netCDF variable, actually -- though the data models really don't match well. I don't think that's how we're using in in CF, though.
Indeed -- and given CF's roots in netCDF, ideally, we'd use the word variable in the same way as netCDF: that is, an array of data with associated attributes. In the text: "To describe the characteristic of a field that is represented by cell values" implies to me that "field" is the geophysical meaning -- some property that spans a area -- wow, that's hard to write a definition for! In short -- yes, it's a great idea to define these terms (and clarify what they don't mean -- e.g. in CF, a "field" is not a field in teh database sense. And to keep the definitions consistent throughout the document. The terminology section seems a good place for this: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.12/cf-conventions.html#terminology I note that in that section, "variable" seems to refer specifically to a netCDF variable. @davidhassell: oddly, I can't seem to find the CF data model written out anywhere -- where can that be found? Maybe these definition should be there. |
Beta Was this translation helpful? Give feedback.
-
Thanks! not sure how I didn't find that myself. I note in: Table I.1. The elements of the CF-netCDF conventions The word "variable" is used a LOT -- but not defined. Maybe it should be??? "field" is: Field: Scientific data discretised within a domain |
Beta Was this translation helpful? Give feedback.
-
By "field" I think we mean "field construct" in the sense of the CF data model, which says "A field construct corresponds to a CF-netCDF data variable with all of its metadata." By "variable" we mean netCDF variable (since this is a netCDF convention). The first sentence which Patrick quotes means "data variable" where it says "variable" i.e. a netCDF variable with a particular role. Out of context, "variable" is unclear, but I suppose the sentence was written with "variable" because the meaning "data variable" seemed obvious from the context. Certainly we should clarify "variable" anywhere it's not currently clear what sort of variable we mean in the context. I agree that it would be helpful to add definitions of field, variable and data variable to the Terminology, and a few other netCDF and CF data model terms. Possibly, as David mentions, there are some other CF data model terms we could use in the convention to improve clarity. For instance we could write "dimension coordinate variable" instead of plain "coordinate variable" when we mean "coordinate variable" in the NUG sense (a 1D variable with the same name as its dimension), and "generic coordinate variable" when we mean dimension coordinate variable, auxiliary coordinate variable or scalar coordinate variable. |
Beta Was this translation helpful? Give feedback.
By "field" I think we mean "field construct" in the sense of the CF data model, which says "A field construct corresponds to a CF-netCDF data variable with all of its metadata." By "variable" we mean netCDF variable (since this is a netCDF convention). The first sentence which Patrick quotes means "data variable" where it says "variable" i.e. a netCDF variable with a particular role. Out of context, "variable" is unclear, but I suppose the sentence was written with "variable" because the meaning "data variable" seemed obvious from the context. Certainly we should clarify "variable" anywhere it's not currently clear what sort of variable we mean in the context.
I agree that it would be hel…