-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Icon support #11
Icon support #11
Conversation
andreaspauling
commented
Dec 12, 2023
- Module update_phenology_realtime allows now ICON input and writes ICON output
- I/O designed to be close to opr setup at MCH
- cfgrib replaced by eccodes for I/O
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andreaspauling 🚀 I am glad that this now runs with ICON!
Please make sure that the GitHub actions are successful before merging the PR.
I ran the code on Balfrin and the update_strength_realtime function is broken. (error message below) I have some more general comments since this is the initial ICON release:
- describe a bit clearer in the README how the user should interact with the main.py script and the notebooks for plotting.
- some notebooks for plotting are in a broken state, I would either remove or fix them. imports seem to be missing
- consider adding logging/debug levels. the code is quite verbose
/users/sadamov/pyprojects/realtime_pollen_calibration/src/realtime_pollen_calibration/utils.py:108: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
data = pd.read_csv(
/users/sadamov/pyprojects/realtime_pollen_calibration/src/realtime_pollen_calibration/utils.py:124: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
data_mod = pd.read_csv(
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/users/sadamov/pyprojects/realtime_pollen_calibration/src/realtime_pollen_calibration/update_strength_realtime.py", line 53, in update_strength_realtime
utils.to_grib(file_in, file_out, dict_fields)
File "/users/sadamov/pyprojects/realtime_pollen_calibration/src/realtime_pollen_calibration/utils.py", line 561, in to_grib
dict_fields[short_name][values == 0] = 0
IndexError: boolean index did not match indexed array along dimension 0; dimension is 786 but corresponding boolean dimension is 919620
>>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file is quite large to upload to git. Maybe consider using git lfs (large file storage)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test data is not part of the repo any more. It will be available from a location defined by osm.
data/grib2_files_ICON-CH1/POV_out
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, it is uncommon to upload large data files directly to git repos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dito
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here on the other hand you are directly linking your scratch, this might not work for other users (is this a problem?). Having all required data in one common place might be benefitial for future users to understand the data-IO better. Updated external fields would always be available in that place (e.g. git lfs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. There are no links any more.
file_out: Location of the desired output file. | ||
verbose: Optional additional debug prints. | ||
|
||
""" | ||
ds = cfgrib.open_dataset(file_in, encode_cf=("time", "geography", "vertical")) | ||
|
||
#file_POV = "data/grib2_files_ICON-CH1/ART_POV_iconR19B08-grid_0001_all_specs_values" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comments if not required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
# Close the GRIB file | ||
fh_Const.close() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these can be considered user input parameters and you might want to write them in a seperate config.yaml
file where the user can define the required species and fields themselves. Unless this is supposed to remain hard-coded and static for years.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Now there is a config.yaml.
# Dictionary to hold DataArrays for each variable | ||
calFields_arrays = {} | ||
|
||
# Loop through variables to create DataArrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you might consider using earthkit's .to_xarray() method for conversion to a hypercube. @clairemerker what is the latest guideline on using eccodes/cfgrib/earthkit? also consider a short discussion with Petra on this matter. I am not sure, earthkit makes the code easier but also adds one dependency.
@@ -30,7 +30,7 @@ | |||
thr_con_24 = {"ALNU": 240, "BETU": 240, "POAC": 72, "CORY": 240} | |||
thr_con_120 = {"ALNU": 720, "BETU": 720, "POAC": 216, "CORY": 720} | |||
failsafe = {"ALNU": 1000, "BETU": 2500, "POAC": 6000, "CORY": 2500} | |||
jul_days_excl = {"ALNU": 14, "BETU": 40, "POAC": 3, "CORY": 46} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider moving these user defined value into a config.yaml file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These values should not be changed by users. Hence I would leave them there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine to leave it here! Could you provide a comment documenting the meaning of those values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@clairemerker Die Pollenkalibration wär jetzt parat für deinen Review. Es ist so ziemlich alles umgestellt... DaniL wird das auch anschauen, da osm das dann betreiben darf ;-) Trotzdem wäre es vielleicht gut, wenn du noch einen Blick darüber werfen könnest |
Thanks for the README. It contains useful information for the configuration of the package and how to run it. Please add the following information to the README: General questions:
Section "How to configure the package"
Section "How to run the package"
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! A lot of refactoring and added documentation!
The scope of the PR was quite big because it mixes code changes and the blueprint update, so I didn't manage to look at everything in details. But it looks good to me overall!
I added a few comments to the code, and I have two comments regarding testing:
- I don't have read access to the test data so I couldn't copy it to test the package. I understood that the test data will be moved in a centralised place, so this issue will probably be solved then.
- Consider adding at least integration tests for the two tools
update_phenology
andupdate_strength
. I run the commands described in the readme and they worked, which is very nice but I didn't expect it since it was unclear to me if the config file was setup in a way that would run for all users. Also, I have no possibility to check if the output is correct. It would be great if this could be integrated as a test, since everything needed for the test seems to already be present :)
@@ -1,4 +1,5 @@ | |||
"""Mutable number.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is mutable number used in the package or just a left-over from the template example? In that case it might be good to remove it to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just a left-over from the template example and is removed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"I don't have read access to the test data so I couldn't copy it to test the package. I understood that the test data will be moved in a centralised place, so this issue will probably be solved then." => Yes, the data will move to a centralised place.
"Consider adding at least integration tests for the two tools update_phenology and update_strength. I run the commands..." => It worked because the config.yaml pointed to data in one of my directories. This may be replaced with pointers to the centralised place or a general . I will discuss this with DaniL.
Regarding the integration test we better discuss in person.
from realtime_pollen_calibration import utils | ||
|
||
|
||
def read_pov_file(pov_infile, pol_fields, config_obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also a function read_pov_file
in the file update_phenology.py
. It is hard to tell why they are different without a doc string. Maybe consider adding one to justify the duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two functions are not doing the same. They could be unified, to be discussed. For now I added a docstring.
README.md
Outdated
|
||
```bash | ||
tools/setup_env.sh -u -e -n <package_env_name> | ||
``` | ||
|
||
*Hint*: If you are the package administrator, it is a good idea to understand what this script does, you can do everything manually with `conda` instructions. | ||
|
||
*Hint*: Use the flag `-m` to speed up the installation using mamba. Of course you will have to install mamba first (we recommend to install mamba into your base | ||
environment `conda install -c conda-forge mamba`. If you install mamba in another (maybe dedicated) environment, environments installed with mamba will be located | ||
*Hint*: Use the flag `-m` to speed up the installation using mamba. Of course you will have to install mamba first (we recommend to install mamba into your base environment `conda install -c conda-forge mamba`). If you install mamba in another (maybe dedicated) environment, environments installed with mamba will be located |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libmamba is the default server for conda versions > 23.10. You could consider removing mamba from the setup script to make it leaner in a future PR .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. mamba removed in the setup script and README.md updated
@@ -30,7 +30,7 @@ | |||
thr_con_24 = {"ALNU": 240, "BETU": 240, "POAC": 72, "CORY": 240} | |||
thr_con_120 = {"ALNU": 720, "BETU": 720, "POAC": 216, "CORY": 720} | |||
failsafe = {"ALNU": 1000, "BETU": 2500, "POAC": 6000, "CORY": 2500} | |||
jul_days_excl = {"ALNU": 14, "BETU": 40, "POAC": 3, "CORY": 46} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine to leave it here! Could you provide a comment documenting the meaning of those values?
README.md
Outdated
station_mod_file : /store_new/mch/msopr/paa/RTcal_testdata/pollen_modelled_2024020118.atab | ||
hour_incr : 1 | ||
``` | ||
`POV_infile`: This GRIB2 file must include the fields specified above. It is used as template for `POV_outfile` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which fields? The one from the section "Features"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The values above are now documented in utils.py.
The fields in that section are now exlicitly listed.
* move get_data.sh to tools and add full test * chmod tools/get_data.sh * implement full test --------- Co-authored-by: tsm <osm@meteoswiss.ch>
* rename test script,scp testdata and update README * Update README.md --------- Co-authored-by: owm-mch <osm@meteoswiss.ch>
Co-authored-by: Andreas Pauling <paa@balfrin-ln003.cscs.ch>
* improved error handling for missing data and log information * check all grib2 infiles for missing fields * Update README.md --------- Co-authored-by: Andreas Pauling <paa@balfrin-ln003.cscs.ch> Co-authored-by: Andreas Pauling <paa@balfrin-ln002.cscs.ch>
Replies:
Section "How to configure the package"
Section "How to run the package"
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not review this again, I understand that other reviewers have taken over in the meantime @andreaspauling. As my previous requested changes are blocking this merge, I suggest that an admin @cosunae removes my previous review from the conversation. I'd rather not approve changes I didn't actually review and currently don't have time for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sieht soweit gut aus für mich.