diff --git a/doc/ecosystem.rst b/doc/ecosystem.rst index 076874d82f3..63f60cd0090 100644 --- a/doc/ecosystem.rst +++ b/doc/ecosystem.rst @@ -74,6 +74,7 @@ Extend xarray capabilities - `Collocate `_: Collocate xarray trajectories in arbitrary physical dimensions - `eofs `_: EOF analysis in Python. - `hypothesis-gufunc `_: Extension to hypothesis. Makes it easy to write unit tests with xarray objects as input. +- `ntv-pandas `_ : A tabular analyzer and a semantic, compact and reversible converter for multidimensional and tabular data - `nxarray `_: NeXus input/output capability for xarray. - `xarray-compare `_: xarray extension for data comparison. - `xarray-dataclasses `_: xarray extension for typed DataArray and Dataset creation. diff --git a/doc/user-guide/pandas.rst b/doc/user-guide/pandas.rst index 76349fcd371..26fa7ea5c0c 100644 --- a/doc/user-guide/pandas.rst +++ b/doc/user-guide/pandas.rst @@ -110,6 +110,26 @@ work even if not the hierarchical index is not a full tensor product: s[::2] s[::2].to_xarray() +Lossless and reversible conversion +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The previous ``Dataset`` example shows that the conversion is not reversible (lossy roundtrip) and +that the size of the ``Dataset`` increases. + +Particularly after a roundtrip, the following deviations are noted: + +- a non-dimension Dataset ``coordinate`` is converted into ``variable`` +- a non-dimension DataArray ``coordinate`` is not converted +- ``dtype`` is not allways the same (e.g. "str" is converted to "object") +- ``attrs`` metadata is not conserved + +To avoid these problems, the third-party `ntv-pandas `__ library offers lossless and reversible conversions between +``Dataset``/ ``DataArray`` and pandas ``DataFrame`` objects. + +This solution is particularly interesting for converting any ``DataFrame`` into a ``Dataset`` (the converter find the multidimensional structure hidden by the tabular structure). + +The `ntv-pandas examples `__ show how to improve the conversion for the previous ``Dataset`` example and for more complex examples. + Multi-dimensional data ~~~~~~~~~~~~~~~~~~~~~~