Stack and unstack periods (#1558)

### Pull Request Checklist: - [ ] This PR addresses an already opened issue (for bug fixes / features) - This PR fixes #xyz - [x] Tests for the changes have been added (for bug fixes / features) - [x] (If applicable) Documentation has been added / updated (for bug fixes / features) - [x] CHANGES.rst has been updated (with summary of main changes) - [ ] Link to issue (:issue:`number`) and pull request (:pull:`number`) has been added ### What kind of change does this PR introduce? New `stack_periods` and it's reverse `unstack_periods`. The basic idea is to have `da.rolling(time=window).construct(dim, stride=stride)`, but with `window` and `stride` given as a multiple of a given `freq`, allowing the use of non-uniform temporal periods (like years or months). And to have a reverse method to go back to the linear timeseries. We had `sdba.construct_moving_yearly_window`, this is the generalization of that logic for: - other frequencies than 'YS' - periods of different lengths (MS, QS) - non-uniform calendars Of course, the two latter points will generate an output where the "fake" time axis makes no sense at all, so this is to be use with precaution. ### Does this PR introduce a breaking change? Yes. `sdba.construct_moving_yearly_window` is transparently sent to `stack_periods` with a warning. `sdba.unpack_moving_yearly_window` also warns and calls `unstack_periods` but the `append_ends` argument is not available on the latter. The behaviour is that same as `append_ends=True`.
Ouranosinc · Dec 15, 2023 · 08e4ceb · 08e4ceb
2 parents 26057a0 + 7b6d33f
commit 08e4ceb
Show file tree

Hide file tree

Showing 7 changed files with 432 additions and 258 deletions.
diff --git a/CHANGES.rst b/CHANGES.rst
@@ -10,12 +10,12 @@ New features and enhancements
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 * Added uncertainty partitioning method `lafferty_sriver` from Lafferty and Sriver (2023), which can partition uncertainty related to the downscaling method. (:issue:`1497`, :pull:`1529`).
 * Validate YAML indicators description before trying to build module. (:issue:`1523`, :pull:`1560`).
+* New ``xclim.core.calendar.stack_periods`` and ``unstack_periods`` for performing ``rolling(time=...).construct(..., stride=...)`` but with non-uniform temporal periods like years or months. They replace ``xclim.sdba.processing.construct_moving_yearly_window`` and ``unpack_moving_yearly_window`` which are deprecated and will be removed in a future release.
 
 Bug fixes
 ^^^^^^^^^
 * Fixed passing ``missing=0`` to ``xclim.core.calendar.convert_calendar`` (:issue:`1562`, :pull:`1563`).
 
-
 v0.47.0 (2023-12-01)
 --------------------
 Contributors to this version: Juliette Lavoie (:user:`juliettelavoie`), Pascal Bourgault (:user:`aulemahal`), Trevor James Smith (:user:`Zeitsperre`), David Huard (:user:`huard`), Éric Dupuis (:user:`coxipi`).

diff --git a/docs/notebooks/sdba-advanced.ipynb b/docs/notebooks/sdba-advanced.ipynb
@@ -346,21 +346,24 @@
     "\n",
     "Some Adjustment methods require that the adjusted data (`sim`) be of the same length (same number of points) than the training data (`ref` and `hist`). These requirements often ensure conservation of statistical properties and a better representation of the climate change signal over the long adjusted timeseries.\n",
     "\n",
-    "In opposition to a conventional \"rolling window\", here it is the _years_ that are the base units of the window, not the elements themselves. `xclim` implements `sdba.construct_moving_yearly_window` and `sdba.unpack_moving_yearly_window` to manipulate data in that goal. The \"construct\" function cuts the data in overlapping windows of a certain length (in years) and stacks them along a new `\"movingdim\"` dimension, alike to xarray's `da.rolling(time=win).construct('movingdim')`, but with yearly steps. The step between each window can also be controlled. This argument is an indicator of how many years overlap between each window. With a value of 1 (the default), a window will have `window - 1` years overlapping with the previous one. `step = window` will result in no overlap at all.\n",
+    "In opposition to a conventional \"rolling window\", here it is the _years_ that are the base units of the window, not the elements themselves. `xclim` implements `xc.core.calendar.stack_periods` and `xc.core.calendar.unstack_periods` to manipulate data in that goal. The \"stack\" function cuts the data in overlapping windows of a certain length and stacks them along a new `\"period\"` dimension, alike to xarray's `da.rolling(time=win).construct('period')`, but with yearly steps. The stride (or step) between each window can also be controlled. This argument is an indicator of how many years overlap between each window. With a value of 1, a window will have `window - 1` years overlapping with the previous one. The default (`None`) is to have `stride = window` will result in no overlap at all. The default units in which `window` and `stride` are given is a year (\"YS\"), but can be changed with argument `freq`.\n",
     "\n",
-    "By default, the result is chunked along this `'movingdim'` dimension. For this reason, the method is expected to be more computationally efficient (when using `dask`) than looping over the windows.\n",
+    "By chunking the result along this `'period'` dimension, it is expected to be more computationally efficient (when using `dask`) than looping over the windows with a for-loop (or a `GroupyBy`)\n",
     "\n",
     "Note that this results in two restrictions:\n",
     "\n",
     "1. The constructed array has the same \"time\" axis for all windows. This is a problem if the actual _year_ is of importance for the adjustment, but this is not the case for any of xclim's current adjustment methods.\n",
     "2. The input timeseries must be in a calendar with uniform year lengths. For daily data, this means only the \"360_day\", \"noleap\" and \"all_leap\" calendars are supported.\n",
     "\n",
-    "The \"unpack\" function does the opposite : it concatenates the windows together to recreate the original timeseries.\n",
-    "The time points that are not part of a window will not appear in the reconstructed timeseries.\n",
-    "If `append_ends` is True, the reconstructed timeseries will go from the first time point of the first window to the last time point of the last window. In the middle, the central `step` years are kept from each window.\n",
-    "If `append_ends` is False, only the central `step` years are kept from each window. Which means the final timeseries has  `(window - step) / 2` years missing on either side, with the extra year missing on the right in case of an odd `(window - step)`. We are missing data, but the contribution from each window is equal.\n",
+    "The \"unstack\" function does the opposite : it concatenates the windows together to recreate the original timeseries. It only works for the no-overlap case where `stride = window` and for the non-ambiguous one where `stride` divides `window` into an odd number (N) of parts. In that latter situation, the middle parts of each period are kept when reconstructing the timeseries, in addition to the first (last) parts of the first (last) period needed to get a full timeseries.\n",
     "\n",
-    "Here, as `ref` and `hist` cover 15 years, we will use a window of 15 on sim. With a step of two (2), this means the first window goes from 2000 to 2014 (inclusive). The last window goes from 2016 to 2030. `window - step = 13`, so six (6) years will be missing at the beginning of the final `scen` and seven (7) years at the end."
+    "Quantile Delta Mapping requires that the adjustment period should be of a length similar to the training one. As our `ref` and `hist` cover 15 years but `sim` covers 31 years, we will transform `sim` by stacking windows of 15 years. With a stride of 5 years, this means the first window goes from 2000 to 2014 (inclusive). Then 2005-2019, 2010-2024 and 2015-2029. The last year will be dropped as it can't be included in any complete window.\n",
+    "\n",
+    "<div class=\"alert alert-warning\">\n",
+    "\n",
+    "In the following example, `QDM` is configurated with `group=\"time.dayofyear\"` which will perform the adjustment for each day of year (doy) separately. When using `stack_periods` the extracted windows are all concatenated along the new `period` axis and they all share the same time coordinate. As such, for the doy information to make sense, we must use a calendar with uniform year lengths. Otherwise, the doys would shift one day at each leap year.\n",
+    "\n",
+    "</div>"
    ]
   },
   {
@@ -382,43 +385,17 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "sim"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from xclim.sdba import construct_moving_yearly_window, unpack_moving_yearly_window\n",
+    "from xclim.core.calendar import stack_periods, unstack_periods\n",
     "\n",
-    "sim_win = construct_moving_yearly_window(sim, window=15, step=2)\n",
+    "sim_win = stack_periods(sim, window=15, stride=5)\n",
     "sim_win"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Here, we retrieve the full timeseries."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "scen_win = unpack_moving_yearly_window(QDM.adjust(sim_win), append_ends=True)\n",
-    "scen_win"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Whereas here, we have gaps at the edges."
+    "Here, we retrieve the full timeseries (minus the lasy year that couldn't fit in any window)."
    ]
   },
   {
@@ -427,63 +404,10 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "scen_win = unpack_moving_yearly_window(QDM.adjust(sim_win), append_ends=False)\n",
+    "scen_win = unstack_periods(QDM.adjust(sim_win))\n",
     "scen_win"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Here is another short example, with an uneven number of years. Here `sim` goes from 2000 to 2029 (30 years instead of 31). With a step of 2 and a window of 15, the first window goes again from 2000 to 2014, but the last one is now from 2014 to 2028. The next window would be 2016-2030, but that last year doesn't exist."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "sim_win = construct_moving_yearly_window(\n",
-    "    sim.sel(time=slice(\"2000\", \"2029\")), window=15, step=2\n",
-    ")\n",
-    "sim_win"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Here, we don't recover the full timeseries, even when we append the ends, because 2029 is not part of a window."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "sim2 = unpack_moving_yearly_window(sim_win, append_ends=True)\n",
-    "sim2"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Without appending the ends, the final timeseries is from 2006 to 2021, 6 years missing at the beginning, like last time and **8** years missing at the end."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "sim2 = unpack_moving_yearly_window(sim_win, append_ends=False)\n",
-    "sim2"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -854,9 +778,7 @@
     "hist.isel(location=1).plot(label=\"hist\", linewidth=lw)\n",
     "ref.isel(location=1).plot(label=\"ref\", linewidth=lw)\n",
     "ref_future.isel(location=1).plot(label=\"ref_future\", linewidth=lw)\n",
-    "leg = plt.legend()\n",
-    "for legobj in leg.legendHandles:\n",
-    "    legobj.set_linewidth(2.0)"
+    "leg = plt.legend()"
    ]
   },
   {

diff --git a/tests/test_calendar.py b/tests/test_calendar.py
@@ -29,7 +29,9 @@
     max_doy,
     parse_offset,
     percentile_doy,
+    stack_periods,
     time_bnds,
+    unstack_periods,
 )
 
 
@@ -704,3 +706,40 @@ def test_convert_doy():
     np.testing.assert_allclose(
         out.isel(lat=0), [31.0, 200.48, 190.0, 59.83607, 299.71885]
     )
+
+
+@pytest.mark.parametrize("cftime", [True, False])
+@pytest.mark.parametrize(
+    "w,s,m,f,ss",
+    [(30, 10, None, "YS", 0), (3, 1, None, "QS-DEC", 60), (6, None, None, "MS", 0)],
+)
+def test_stack_periods(tas_series, cftime, w, s, m, f, ss):
+    da = tas_series(np.arange(365 * 50), start="2000-01-01", cftime=cftime)
+
+    da_stck = stack_periods(
+        da, window=w, stride=s, min_length=m, freq=f, align_days=False
+    )
+
+    assert "period_length" in da_stck.coords
+
+    da2 = unstack_periods(da_stck)
+
+    xr.testing.assert_identical(da2, da.isel(time=slice(ss, da2.time.size + ss)))
+
+
+def test_stack_periods_special(tas_series):
+    da = tas_series(np.arange(365 * 48 + 12), cftime=True, start="2004-01-01")
+
+    with pytest.raises(ValueError, match="unaligned day-of-year"):
+        stack_periods(da)
+
+    da = da.convert_calendar("noleap")
+
+    da_stck = stack_periods(da, dim="horizon")
+    np.testing.assert_array_equal(da_stck.horizon_length, 10950)
+
+    with pytest.raises(ValueError, match="can't find the window"):
+        unstack_periods(da_stck)
+
+    da2 = unstack_periods(da_stck.drop_vars("horizon_length"), dim="horizon")
+    xr.testing.assert_identical(da2, da.isel(time=slice(0, da2.time.size)))
diff --git a/tests/test_sdba/test_processing.py b/tests/test_sdba/test_processing.py
@@ -10,7 +10,6 @@
 from xclim.sdba.base import Grouper
 from xclim.sdba.processing import (
     adapt_freq,
-    construct_moving_yearly_window,
     escore,
     from_additive_space,
     jitter,
@@ -21,7 +20,6 @@
     stack_variables,
     standardize,
     to_additive_space,
-    unpack_moving_yearly_window,
     unstack_variables,
     unstandardize,
 )
@@ -282,45 +280,3 @@ def test_stack_variables(open_dataset):
     ds1p = unstack_variables(da1)
 
     xr.testing.assert_equal(ds1, ds1p)
-
-
-@pytest.mark.parametrize(
-    "window,step,lengths",
-    [
-        (1, 1, 151),
-        (5, 5, 30),
-        (10, 10, 15),
-        (25, 25, 6),
-        (50, 50, 3),
-        (None, None, 131),
-    ],
-)
-def test_construct_moving_yearly_window(open_dataset, window, step, lengths):
-    ds = open_dataset("sdba/CanESM2_1950-2100.nc")
-
-    calls = {k: v for k, v in dict(window=window, step=step).items() if v is not None}
-    da_windowed = construct_moving_yearly_window(ds.tasmax, **calls)
-
-    assert len(da_windowed) == lengths
-
-
-def test_construct_moving_yearly_window_standard_calendar(tasmin_series):
-    tasmin = tasmin_series(np.zeros(365 * 30), start="1997-01-01", units="degC")
-
-    with pytest.raises(ValueError):
-        construct_moving_yearly_window(tasmin)
-
-
-@pytest.mark.parametrize("append_ends", [True, False])
-def test_unpack_moving_yearly_window(open_dataset, append_ends):
-    tasmax = open_dataset("sdba/ahccd_1950-2013.nc").tasmax
-
-    tasmax_windowed = construct_moving_yearly_window(tasmax)
-
-    tx_deconstructed = unpack_moving_yearly_window(
-        tasmax_windowed, append_ends=append_ends
-    )
-    if append_ends:
-        np.testing.assert_array_equal(tasmax, tx_deconstructed)
-    else:
-        assert len(tx_deconstructed.time) < len(tasmax.time)