Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the Making_Maps_with_Cartopy tutorial #150

Closed
navidcy opened this issue Oct 8, 2021 · 25 comments
Closed

Update the Making_Maps_with_Cartopy tutorial #150

navidcy opened this issue Oct 8, 2021 · 25 comments
Assignees
Labels
🧹 cleanup 💻 hackathon 2.0 like the 1.0 but better 🛸 updating An existing notebook needs to be updated

Comments

@navidcy
Copy link
Collaborator

navidcy commented Oct 8, 2021

After the long thread that started with @hakaseh's question about plotting (see ARCCSS @ https://arccss.slack.com/archives/C6PP0GU9Y/p1632954943031000) I think we should update the Making Maps Tutorial so that:

  • It demonstrates xarray functionality (e.g., dataarray.plot() instead of pyplot.pcolormesh(longitude, latitude, dataarray)
  • Avoids loading custom made grids, e.g.,
    geolon_t = xr.open_dataset('/g/data/ik11/grids/ocean_grid_025.nc').geolon_t
    geolat_t = xr.open_dataset('/g/data/ik11/grids/ocean_grid_025.nc').geolat_t
    that rely on somebody actually putting them there. Instead plots could/should be done using the field's coordinate information.

What do you all think? @AndyHoggANU, @aidanheerdegen, @aekiss
I might give it a go.

P.S.: As a side note, I admit, that the grids in the /g/data/ik11/grids/ do not contain the same information for the three different resolutions. I've been raising this from time to time but it seems very hard to homogenize. I'm hoping that eventually we can drop any dependance on these grids being there whatsoever.

@navidcy navidcy added 🧹 cleanup 🛸 updating An existing notebook needs to be updated labels Oct 8, 2021
@AndyHoggANU
Copy link
Contributor

I would encourage this. Would also like to do it myself, but I suspect that is impractical. I will promise to review it when it is done.
Also, agree we need s a better grids solution. @aidanheerdegen - any ideas here?

@navidcy
Copy link
Collaborator Author

navidcy commented Oct 10, 2021

OK, I was actually wrong or not thinking this straight.

We can definitely use xarray's plotting functionality. See, e.g., this:

geolon_t = xr.open_dataset('/g/data/ik11/grids/ocean_grid_025.nc').geolon_t
geolat_t = xr.open_dataset('/g/data/ik11/grids/ocean_grid_025.nc').geolat_t

session = cc.database.create_session()

darray = cc.querying.getvar('025deg_jra55v13_iaf_gmredi6', 'surface_temp', session, frequency='1 monthly', n=-1)
SST = darray.mean('time') - 273.15; # convert from degrees K to degrees C
SST = SST.assign_coords({'geolat_t': geolat_t, 'geolon_t': geolon_t}).rename({'geolat_t': 'latitude', 'geolon_t': 'longitude'})
SST

Screen Shot 2021-10-10 at 11 58 49 am

SST.plot(x='longitude', y='latitude', vmin=0, vmax=35);

Screen Shot 2021-10-10 at 11 59 08 am

and

plt.figure(figsize=(12, 5))

ax = plt.axes(projection=ccrs.Orthographic(central_latitude=80, central_longitude=50))

SST.plot(x='longitude', y='latitude', transform=ccrs.PlateCarree(), vmin=0, vmax=35);

Screen Shot 2021-10-10 at 11 59 16 am

But still we can't avoid loading the unmasked grids saved at /g/data/ik11/grids/.... The OMIP2 dataset that @hakaseh was using includes the unmasked 2D arrays of longitude/latitude in each variable. And that's why you could plot properly without the necessity to load some unmasked grids that somebody made and saved somewhere...

Why don't we include the unmasked 2D arrays of longitude/latitude in all output fields? This would be a remedy to that, wouldn't it be? @AndyHoggANU, @aidanheerdegen?
At the moment we include xt_ocean and yt_ocean but these alone are not enough to plot the field!

@navidcy
Copy link
Collaborator Author

navidcy commented Oct 10, 2021

Note that if you just do .assign_coords({'latitude': geolat_t, 'longitude': geolon_t}) as the tutorial currently suggests, then you end up with the SST data array having all four coords: geolon_t, longitude, geolat_t, latitude...

darray = cc.querying.getvar('025deg_jra55v13_iaf_gmredi6', 'surface_temp', session, frequency='1 monthly', n=-1)
SST = darray.mean('time') - 273.15; # convert from degrees K to degrees C
SST = SST.assign_coords({'latitude': geolat_t, 'longitude': geolon_t})
SST

Screen Shot 2021-10-10 at 12 10 26 pm

The (imminent) PR should remedy this as well. :)

@aidanheerdegen
Copy link
Contributor

aidanheerdegen commented Oct 10, 2021

Also, agree we need s a better grids solution. @aidanheerdegen - any ideas here?

Lots. Issue COSIMA/cosima-cookbook#191 proposes some mechanisms to identify and define grids in the CC database. An obvious use case for this would be named grids, or auto-loading grid information in getvar.

An intermin option would be to somehow index the grid data so that is is accessible in the Cookbook database under a suitable name.

Why don't we include the unmasked 2D arrays of longitude/latitude in all output fields? This would be a remedy to that, wouldn't it be? @AndyHoggANU, @aidanheerdegen?
At the moment we include xt_ocean and yt_ocean but these alone are not enough to plot the field!

The variables are masked because we run the model masked to reduce the number of cpus required, and this is an unavoidable** side-effect of that.

** There are some caveats but for the case in point it is unavoidable.

@navidcy
Copy link
Collaborator Author

navidcy commented Oct 10, 2021

Why don't we include the unmasked 2D arrays of longitude/latitude in all output fields? This would be a remedy to that, wouldn't it be? @AndyHoggANU, @aidanheerdegen?
At the moment we include xt_ocean and yt_ocean but these alone are not enough to plot the field!

The variables are masked because we run the model masked to reduce the number of cpus required, and this is an unavoidable** side-effect of that.

** There are some caveats but for the case in point it is unavoidable.

Oh, I see!

But then, I'm just wondering, how come the OMIP data that @hakaseh was loading had unmasked 2D longitude/latitude coordinates? Isn't that produced by ACCESS-OM2?

@aidanheerdegen
Copy link
Contributor

The post-processing is adding umasked grids back into the data.

@navidcy
Copy link
Collaborator Author

navidcy commented Oct 10, 2021

The post-processing is adding umasked grids back into the data.

Let's do that then!

(I don't know what the "post-processing" is... Is it something trivial? Something, e.g., payu can be doing as a post-process thing?)

@aidanheerdegen
Copy link
Contributor

No.

@navidcy
Copy link
Collaborator Author

navidcy commented Nov 14, 2022

OK, I'll close this perhaps? @aidanheerdegen?

@aidanheerdegen
Copy link
Contributor

Yeah I guess. There isn't anything happening, and it's linked to the other issue so can be found again.

Mind you ... perhaps grids is something @rbeucher might be interested in adding to his "TO-DO" list?

@navidcy
Copy link
Collaborator Author

navidcy commented Nov 14, 2022

Yes, yes, yes!!!

This issue with masked grids is something we should address. There should be a easy way to get the unmasked grids and don't rely in some .nc file somebody has put somewhere with some medatada and not others and which is not the same across model resolutions (as is the case now).

If, for example, a user tries to run the Maps tutorial for 1 degree model it will fail.

cc @rbeucher

@rbeucher
Copy link
Contributor

rbeucher commented Nov 14, 2022

OK. So a bit late for the party... I am trying to get my head around this.

So if I understand correctly

  • the grids stored with the output fields are masked so they can't be used to plot the fields... correct?
  • Unmasked grids are available in /g/data/ik11/grids/...
  • The Tutorial uses the unmasked grids to plot the fields. Pb this assumes that the user knows about them, where to find them etc...This is a weakness in the workflow.

Why is the unmasked grid not provided directly? Not sure I understand why it is unavoidable
Could we have the unmasked grids, a mask, and the variables in the output?

A post-processing step that adds the unmasked grid to the output seems like a quick solution...BUT
Having the full grid available directly from the output seems to be important to me...

@navidcy
Copy link
Collaborator Author

navidcy commented Nov 14, 2022

OK. So a bit late for the party... I am trying to get my head around this.

So if I understand correctly

  • the grids stored with the output fields are masked so they can't be used to plot the fields... correct?

Yes

  • Unmasked grids are available in /g/data/ik11/grids/...

Yes, but not in same form for each resolution! I don’t know why!!

  • The Tutorial uses the unmasked grids to plot the fields. Pb this assumes that the user knows about them, where to find them etc...This is a weakness in the workflow.

indeed

Why is the unmasked grid not provided directly? Not sure I understand why it is unavoidable Could we have the unmasked grids, a mask, and the variables in the output?

don’t have an opinion on this

A post-processing step that adds the unmasked grid to the output seems like a quick solution...BUT Having the full grid available directly from the output seems to be important to me...

@rbeucher
Copy link
Contributor

If, for example, a user tries to run the Maps tutorial for 1 degree model it will fail.

cc @rbeucher

Why? Because the tutorial loads a specific grid?

@navidcy
Copy link
Collaborator Author

navidcy commented Nov 14, 2022

If, for example, a user tries to run the Maps tutorial for 1 degree model it will fail.
cc @rbeucher

Why? Because the tutorial loads a specific grid?

No. Given that you fix the filename still there is an issue because the variables saved in the 1 degree nc file are different!!!

@rbeucher
Copy link
Contributor

If, for example, a user tries to run the Maps tutorial for 1 degree model it will fail.
cc @rbeucher

Why? Because the tutorial loads a specific grid?

No. Given that you fix the filename still there is an issue because the variables saved in the 1 degree nc file are different!!!

Oh OK... Now I get it.

@aidanheerdegen
Copy link
Contributor

Why is the unmasked grid not provided directly? Not sure I understand why it is unavoidable
Could we have the unmasked grids, a mask, and the variables in the output?

The ocean data has the grid, but it is masked because of processing masking. In the masked sections there is no lat/lon values, because there was no data file for those sections. This isn't an issue for the data field itself, as there are NaNs there, which is correct. It is a problem if the coordinates have NaNs in them.

Now it is the case that for some processor and IO layout combinations it is possible that there are no NaNs in the coordinates, i.e. they re fine. As long as there is at least one umasked processor tile in each IO tile (which is an aggregate of IO tiles) the coordinate values will be correctly written to the output file. But in general this cannot be relied upon.

We've generally never done a lot of post-processing of files because it either led to duplication (bad in general and simply untenable for 0.1 degree data) or risked data loss, without some careful scripting and care to make sure the were no errors.

BUT I can see if you have a proper grid file it is possible to replace the values of the coordinates with those from the proper grid, and do it in-place and with very little risk of data loss as it is a relatively simple process.

So yeah, I can see that sort of post-processing step being a good idea and quite achievable.

What I suggested to @rbeucher was making nice grid files, with proper meta-data etc and index them so they are in the Cookbook DB. Then document how to access and use them in some notebooks like this one.

@rbeucher
Copy link
Contributor

Why is the unmasked grid not provided directly? Not sure I understand why it is unavoidable
Could we have the unmasked grids, a mask, and the variables in the output?

The ocean data has the grid, but it is masked because of processing masking. In the masked sections there is no lat/lon values, because there was no data file for those sections. This isn't an issue for the data field itself, as there are NaNs there, which is correct. It is a problem if the coordinates have NaNs in them.

Yeah I get why you mask for processing but why don't you provide the unmasked grid in the outputs (and possibly the mask if it is useful). I don't know the internal details. Does the masking result in some sparse storage to optimise memory usage? I mean do you effectively lose the masked info?

We can replace the missing values if we have a proper grid file, that's easy enough I suppose but I wonder why we don't fix the pb at the source. Again just trying to get a full grasp of the pb.

@aidanheerdegen
Copy link
Contributor

I mean do you effectively lose the masked info?

Correct. It simply does not exist for the ocean model in the masked locations.

The ice model works differently. It does all IO through the master PE (which is why PIO is required for higher core counts otherwise IO completely dominates execution time). So it doesn't have the same problems. It also copies all it's grid information all the time from one restart location to the next, hence there is a 1TB of identical grid files in /g/data/ik11/outputs/access-om2-01.

$ find /g/data/ik11/outputs/access-om2-01 -name "grid.nc" -exec du -hc {} +
1000G   total

@rbeucher
Copy link
Contributor

OK looks like we do need a post-processing step then.

@aidanheerdegen
Copy link
Contributor

The problem with that is you're not going to post-process every dataset that has already been created. So to be support existing data there should be some convenient way to access the correct grid ...

@rbeucher
Copy link
Contributor

Yes I get that. It's OK if we are talking about a few grids I suppose.

For future runs, could we imagine adding some sort of reference to the grid?
I mean we could release and mint a DOI for the grids and enforce that the grid information is added to the outputs.
That would make things easier. That's also good for provenance / reproducibility.

What do you think?

@navidcy navidcy added 📺 hackathon 1.0 Tasks for the 2020 Hackathon 💻 hackathon 2.0 like the 1.0 but better and removed 📺 hackathon 1.0 Tasks for the 2020 Hackathon labels Jan 10, 2023
@rbeucher
Copy link
Contributor

Happy to keep working on this. I can't assign myself though....

@adele-morrison
Copy link
Collaborator

I think this issue (at least what's requested in the first post) has been resolved @navidcy?

@navidcy
Copy link
Collaborator Author

navidcy commented Apr 16, 2024

The issue will be resolved when #212 is resolved. But I guess, since there is a new issue we can close this one and focus on #212 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🧹 cleanup 💻 hackathon 2.0 like the 1.0 but better 🛸 updating An existing notebook needs to be updated
Development

No branches or pull requests

5 participants