Icon support #11

andreaspauling · 2023-12-12T16:22:49Z

Module update_phenology_realtime allows now ICON input and writes ICON output
I/O designed to be close to opr setup at MCH
cfgrib replaced by eccodes for I/O

sadamov

Thanks @andreaspauling 🚀 I am glad that this now runs with ICON!
Please make sure that the GitHub actions are successful before merging the PR.
I ran the code on Balfrin and the update_strength_realtime function is broken. (error message below) I have some more general comments since this is the initial ICON release:

describe a bit clearer in the README how the user should interact with the main.py script and the notebooks for plotting.
some notebooks for plotting are in a broken state, I would either remove or fix them. imports seem to be missing
consider adding logging/debug levels. the code is quite verbose

/users/sadamov/pyprojects/realtime_pollen_calibration/src/realtime_pollen_calibration/utils.py:108: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  data = pd.read_csv(
/users/sadamov/pyprojects/realtime_pollen_calibration/src/realtime_pollen_calibration/utils.py:124: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  data_mod = pd.read_csv(
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/users/sadamov/pyprojects/realtime_pollen_calibration/src/realtime_pollen_calibration/update_strength_realtime.py", line 53, in update_strength_realtime
    utils.to_grib(file_in, file_out, dict_fields)
  File "/users/sadamov/pyprojects/realtime_pollen_calibration/src/realtime_pollen_calibration/utils.py", line 561, in to_grib
    dict_fields[short_name][values == 0] = 0
IndexError: boolean index did not match indexed array along dimension 0; dimension is 786 but corresponding boolean dimension is 919620
>>>

sadamov · 2023-12-13T03:24:01Z

data/grib2_files_ICON-CH1/ART_POV_iconR19B08-grid_0001_all_specs_values

this file is quite large to upload to git. Maybe consider using git lfs (large file storage)?

The test data is not part of the repo any more. It will be available from a location defined by osm.

sadamov · 2023-12-13T03:24:49Z

data/grib2_files_ICON-CH1/POV_out

Same here, it is uncommon to upload large data files directly to git repos.

sadamov · 2023-12-13T03:25:25Z

data/grib2_files_ICON-CH1/iaf2023042500

Here on the other hand you are directly linking your scratch, this might not work for other users (is this a problem?). Having all required data in one common place might be benefitial for future users to understand the data-IO better. Updated external fields would always be available in that place (e.g. git lfs).

done. There are no links any more.

sadamov · 2023-12-13T03:31:31Z

src/realtime_pollen_calibration/update_phenology_realtime.py

        file_out: Location of the desired output file.
        verbose: Optional additional debug prints.

    """
-    ds = cfgrib.open_dataset(file_in, encode_cf=("time", "geography", "vertical"))
+
+    #file_POV = "data/grib2_files_ICON-CH1/ART_POV_iconR19B08-grid_0001_all_specs_values"


remove comments if not required

sadamov · 2023-12-13T03:33:10Z

src/realtime_pollen_calibration/update_phenology_realtime.py

+
+    # Close the GRIB file
+    fh_Const.close()
+


these can be considered user input parameters and you might want to write them in a seperate config.yaml file where the user can define the required species and fields themselves. Unless this is supposed to remain hard-coded and static for years.

Done. Now there is a config.yaml.

sadamov · 2023-12-13T03:37:39Z

src/realtime_pollen_calibration/update_phenology_realtime.py

+    # Dictionary to hold DataArrays for each variable
+    calFields_arrays = {}
+
+    # Loop through variables to create DataArrays


Here you might consider using earthkit's .to_xarray() method for conversion to a hypercube. @clairemerker what is the latest guideline on using eccodes/cfgrib/earthkit? also consider a short discussion with Petra on this matter. I am not sure, earthkit makes the code easier but also adds one dependency.

sadamov · 2023-12-13T03:38:11Z

src/realtime_pollen_calibration/utils.py

@@ -30,7 +30,7 @@
 thr_con_24 = {"ALNU": 240, "BETU": 240, "POAC": 72, "CORY": 240}
 thr_con_120 = {"ALNU": 720, "BETU": 720, "POAC": 216, "CORY": 720}
 failsafe = {"ALNU": 1000, "BETU": 2500, "POAC": 6000, "CORY": 2500}
-jul_days_excl = {"ALNU": 14, "BETU": 40, "POAC": 3, "CORY": 46}


consider moving these user defined value into a config.yaml file

These values should not be changed by users. Hence I would leave them there

Fine to leave it here! Could you provide a comment documenting the meaning of those values?

…port

andreaspauling · 2024-05-22T14:35:53Z

@clairemerker Die Pollenkalibration wär jetzt parat für deinen Review. Es ist so ziemlich alles umgestellt... DaniL wird das auch anschauen, da osm das dann betreiben darf ;-) Trotzdem wäre es vielleicht gut, wenn du noch einen Blick darüber werfen könnest

DanielLeuenberger · 2024-05-22T15:20:54Z

Thanks for the README. It contains useful information for the configuration of the package and how to run it. Please add the following information to the README:

General questions:

What happens, if there are no(t enough) pollen fields in the POV_infile, the station_obs_file, station_mod_file?
How do we initialize a season? Then, we do not have 120h of past pollen from the model history. Can fieldextra fill all 120h with missing values?

Section "How to configure the package"

Specify in more detail the fields that need to be present in the different grib and atab files: variable(s) (I have nothing found "specified above"), date and time, restrictions to grib2 metadata (such as process generating ID etc) if any.
"last 120 hours" relative to which date/time? To that of the fields in POV_infile? Please document this.
what is the hour_incr for? Are there use cases in which it is need to be set to different values than 1?

Section "How to run the package"

I do not understand the paragraph "The phenological model of ICON..." Please document in more details what is exactly needed to run and updated at which time of the day. Does the calibration needs to be run before the respective KENDA-CH1 cycle? We can enforce that with dependencies in the LMPackage
what are the "phenological fields"? Please specify in detail.
Please make sure that the script does not need to be in the root directory of the installed package and that the script can be run from anywhere!

clairemerker

Nice work! A lot of refactoring and added documentation!
The scope of the PR was quite big because it mixes code changes and the blueprint update, so I didn't manage to look at everything in details. But it looks good to me overall!

I added a few comments to the code, and I have two comments regarding testing:

I don't have read access to the test data so I couldn't copy it to test the package. I understood that the test data will be moved in a centralised place, so this issue will probably be solved then.
Consider adding at least integration tests for the two tools update_phenology and update_strength. I run the commands described in the readme and they worked, which is very nice but I didn't expect it since it was unclear to me if the config file was setup in a way that would run for all users. Also, I have no possibility to check if the output is correct. It would be great if this could be integrated as a test, since everything needed for the test seems to already be present :)

clairemerker · 2024-05-22T15:15:51Z

src/realtime_pollen_calibration/mutable_number.py

@@ -1,4 +1,5 @@
 """Mutable number."""


Is mutable number used in the package or just a left-over from the template example? In that case it might be good to remove it to avoid confusion.

It is just a left-over from the template example and is removed now.

"I don't have read access to the test data so I couldn't copy it to test the package. I understood that the test data will be moved in a centralised place, so this issue will probably be solved then." => Yes, the data will move to a centralised place.
"Consider adding at least integration tests for the two tools update_phenology and update_strength. I run the commands..." => It worked because the config.yaml pointed to data in one of my directories. This may be replaced with pointers to the centralised place or a general . I will discuss this with DaniL.
Regarding the integration test we better discuss in person.

clairemerker · 2024-05-23T07:12:46Z

src/realtime_pollen_calibration/update_strength.py

+from realtime_pollen_calibration import utils
+
+
+def read_pov_file(pov_infile, pol_fields, config_obj):


There is also a function read_pov_file in the file update_phenology.py. It is hard to tell why they are different without a doc string. Maybe consider adding one to justify the duplication.

The two functions are not doing the same. They could be unified, to be discussed. For now I added a docstring.

clairemerker · 2024-05-23T07:20:10Z

README.md


 ```bash
 tools/setup_env.sh -u -e -n <package_env_name>
 ```

 *Hint*: If you are the package administrator, it is a good idea to understand what this script does, you can do everything manually with `conda` instructions.

-*Hint*: Use the flag `-m` to speed up the installation using mamba. Of course you will have to install mamba first (we recommend to install mamba into your base
-environment `conda install -c conda-forge mamba`. If you install mamba in another (maybe dedicated) environment, environments installed with mamba will be located
+*Hint*: Use the flag `-m` to speed up the installation using mamba. Of course you will have to install mamba first (we recommend to install mamba into your base environment `conda install -c conda-forge mamba`). If you install mamba in another (maybe dedicated) environment, environments installed with mamba will be located


libmamba is the default server for conda versions > 23.10. You could consider removing mamba from the setup script to make it leaner in a future PR .

Done. mamba removed in the setup script and README.md updated

clairemerker · 2024-05-23T07:22:51Z

src/realtime_pollen_calibration/utils.py

@@ -30,7 +30,7 @@
 thr_con_24 = {"ALNU": 240, "BETU": 240, "POAC": 72, "CORY": 240}
 thr_con_120 = {"ALNU": 720, "BETU": 720, "POAC": 216, "CORY": 720}
 failsafe = {"ALNU": 1000, "BETU": 2500, "POAC": 6000, "CORY": 2500}
-jul_days_excl = {"ALNU": 14, "BETU": 40, "POAC": 3, "CORY": 46}


Fine to leave it here! Could you provide a comment documenting the meaning of those values?

clairemerker · 2024-05-23T07:29:10Z

README.md

+station_mod_file : /store_new/mch/msopr/paa/RTcal_testdata/pollen_modelled_2024020118.atab
+hour_incr : 1
+```
+`POV_infile`: This GRIB2 file must include the fields specified above. It is used as template for `POV_outfile`


Which fields? The one from the section "Features"?

The values above are now documented in utils.py.
The fields in that section are now exlicitly listed.

* move get_data.sh to tools and add full test * chmod tools/get_data.sh * implement full test --------- Co-authored-by: tsm <osm@meteoswiss.ch>

* rename test script,scp testdata and update README * Update README.md --------- Co-authored-by: owm-mch <osm@meteoswiss.ch>

Co-authored-by: Andreas Pauling <paa@balfrin-ln003.cscs.ch>

* improved error handling for missing data and log information * check all grib2 infiles for missing fields * Update README.md --------- Co-authored-by: Andreas Pauling <paa@balfrin-ln003.cscs.ch> Co-authored-by: Andreas Pauling <paa@balfrin-ln002.cscs.ch>

andreaspauling · 2024-10-03T10:04:17Z

Thanks for the README. It contains useful information for the configuration of the package and how to run it. Please add the following information to the README:

General questions:

What happens, if there are no(t enough) pollen fields in the POV_infile, the station_obs_file, station_mod_file?

How do we initialize a season? Then, we do not have 120h of past pollen from the model history. Can fieldextra fill all 120h with missing values?

Section "How to configure the package"

Specify in more detail the fields that need to be present in the different grib and atab files: variable(s) (I have nothing found "specified above"), date and time, restrictions to grib2 metadata (such as process generating ID etc) if any.

"last 120 hours" relative to which date/time? To that of the fields in POV_infile? Please document this.

what is the hour_incr for? Are there use cases in which it is need to be set to different values than 1?

Section "How to run the package"

I do not understand the paragraph "The phenological model of ICON..." Please document in more details what is exactly needed to run and updated at which time of the day. Does the calibration needs to be run before the respective KENDA-CH1 cycle? We can enforce that with dependencies in the LMPackage

what are the "phenological fields"? Please specify in detail.

Please make sure that the script does not need to be in the root directory of the installed package and that the script can be run from anywhere!

Replies:
General questions:

If at least one of the mandatory fields is missing the package exits with status 1 and tells the user.
This is handled by the oper. workflow

Section "How to configure the package"

This is now documented in more detail.
dito
This parameter should be adapted by the user if the calibration is done for a subsequent run more than one hour ahead. This is also documented now

Section "How to run the package"

The paragraph has been re-written. The oper. workflow is already in place.
dito
done

sadamov

I did not review this again, I understand that other reviewers have taken over in the meantime @andreaspauling. As my previous requested changes are blocking this merge, I suggest that an admin @cosunae removes my previous review from the conversation. I'd rather not approve changes I didn't actually review and currently don't have time for it.

as proposed by reviewer

DanielLeuenberger

Sieht soweit gut aus für mich.

Andreas Pauling added 3 commits November 29, 2023 09:02

init branch icon_suppoert

2fadef5

ICON support of update_phenology_realtime

6840518

Bug correction

b95e4d9

andreaspauling requested review from clairemerker and sadamov December 12, 2023 16:22

andreaspauling self-assigned this Dec 12, 2023

sadamov previously requested changes Dec 13, 2023

View reviewed changes

Andreas Pauling and others added 17 commits December 29, 2023 12:37

ICON support of update_strength_realtime

32e3804

Support of advancing calibrated fields by 1 hour

9cdaa9c

Reorganisation of input files

f255df7

update to latest blueprint

9afd616

pop some stashes

72dd884

test

d38fa5e

pre-commit cleanup

1d0eaa7

minor cleanup

a46d1d4

Merge commit 'a46d1d4443793a8045c84f91b90a1e19b019f225' into icon_sup…

d213dd8

…port

add run file

4834117

Command line interface support

23dd1b3

reorganisation of code, setup and config, cleanup

f8d4f0f

Small bug fixes

a5e37aa

Deleted files: obsolete files after ICON support

bbb4869

improved code (solve pre-commit errors)

5d0c9c9

Solve pre-commit errors not present locally but on github

5b38dfe

editorial update

96df752

clairemerker reviewed May 23, 2024

View reviewed changes

Andreas Pauling added 3 commits May 30, 2024 09:22

integraton of reviewer comments

0342316

editorial change needed for codespell hook on github

f7e4ffe

update for pre-commit

8381959

Andreas Pauling and others added 13 commits May 30, 2024 12:13

update for pre-commit on github

848fd8f

updates for hooks on Github

51d97a5

Editorial update

efab2b0

add extraction scripts in tools/

e54cb60

editorial changes to tools/extract_pollen_measured.sh

2a1d4d6

Revise tests (#12)

7b694ef

* move get_data.sh to tools and add full test * chmod tools/get_data.sh * implement full test --------- Co-authored-by: tsm <osm@meteoswiss.ch>

fix variable in get_data.sh

dfd57ba

rename test script,scp testdata and update README (#13)

81ebcb7

* rename test script,scp testdata and update README * Update README.md --------- Co-authored-by: owm-mch <osm@meteoswiss.ch>

change T_2M test data to 12UTC and document this in README.md

1a63380

remove whitespaces in README.md

31ef83f

save log information and tell the user (#16)

1222b27

Co-authored-by: Andreas Pauling <paa@balfrin-ln003.cscs.ch>

Satisfy pre-commit reqs

4d52ffe

andreaspauling requested a review from sadamov October 3, 2024 10:06

sadamov reviewed Oct 3, 2024

View reviewed changes

andreaspauling requested a review from DanielLeuenberger October 15, 2024 07:28

DanielLeuenberger approved these changes Oct 21, 2024

View reviewed changes

andreaspauling merged commit 9d82cff into main Oct 21, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Icon support #11

Icon support #11

andreaspauling commented Dec 12, 2023

sadamov left a comment

sadamov Dec 13, 2023

andreaspauling May 22, 2024

sadamov Dec 13, 2023

andreaspauling May 22, 2024

sadamov Dec 13, 2023

andreaspauling May 22, 2024

sadamov Dec 13, 2023

andreaspauling May 22, 2024

sadamov Dec 13, 2023

andreaspauling May 22, 2024

sadamov Dec 13, 2023

sadamov Dec 13, 2023

andreaspauling May 22, 2024

clairemerker May 23, 2024

andreaspauling May 29, 2024

andreaspauling commented May 22, 2024

DanielLeuenberger commented May 22, 2024 •

edited

Loading

clairemerker left a comment

clairemerker May 22, 2024

andreaspauling May 29, 2024

andreaspauling May 29, 2024

clairemerker May 23, 2024

andreaspauling May 29, 2024

clairemerker May 23, 2024

andreaspauling May 29, 2024 •

edited

Loading

clairemerker May 23, 2024

clairemerker May 23, 2024

andreaspauling May 29, 2024

andreaspauling commented Oct 3, 2024

sadamov left a comment •

edited

Loading

DanielLeuenberger left a comment

		from realtime_pollen_calibration import utils


		def read_pov_file(pov_infile, pol_fields, config_obj):

Icon support #11

Icon support #11

Conversation

andreaspauling commented Dec 12, 2023

sadamov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andreaspauling commented May 22, 2024

DanielLeuenberger commented May 22, 2024 • edited Loading

clairemerker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andreaspauling May 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andreaspauling commented Oct 3, 2024

sadamov left a comment • edited Loading

Choose a reason for hiding this comment

DanielLeuenberger left a comment

Choose a reason for hiding this comment

DanielLeuenberger commented May 22, 2024 •

edited

Loading

andreaspauling May 29, 2024 •

edited

Loading

sadamov left a comment •

edited

Loading