Skip to content

Commit

Permalink
feat(config)!: Improved configuration and data structures (#79)
Browse files Browse the repository at this point in the history
* feat: Implements a config validator via pydantic basemodels
---------

Co-authored-by: Charlie Hebert-Pinard <ecmv9074@ac6-100.bullx>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: chebertpinard <charlie.hebert-pinard@ec.gc.ca>
Co-authored-by: anaprietonem <ana.prietonemesio@ecmwf.int>
Co-authored-by: Mario Santa Cruz <48736305+JPXKQX@users.noreply.github.com>
Co-authored-by: Mario Santa Cruz <mariosanta_cruz@hotmail.com>
Co-authored-by: Harrison Cook <harrison.cook@ecmwf.int>
  • Loading branch information
8 people authored Feb 17, 2025
1 parent f4da73c commit 1f7812b
Show file tree
Hide file tree
Showing 69 changed files with 2,650 additions and 140 deletions.
1 change: 0 additions & 1 deletion .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@
- [ ] I have run the Benchmark Profiler against the old version of the code
- [ ] If the new feature introduces modifications at the config level, I have made sure to update Pydantic Schemas and default configs accordingly


<!-- In case this affects the model sharding or other specific components please describe these here. -->

### Dependencies
Expand Down
2 changes: 0 additions & 2 deletions graphs/src/anemoi/graphs/nodes/attributes.py
Original file line number Diff line number Diff line change
Expand Up @@ -237,8 +237,6 @@ class NonmissingZarrVariable(BooleanBaseNodeAttribute):
----------
variable : str
Variable to read from the Zarr dataset.
norm : str
Normalization of the weights.
Methods
-------
Expand Down
6 changes: 3 additions & 3 deletions graphs/src/anemoi/graphs/nodes/builders/from_file.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,15 +68,15 @@ class TextNodes(BaseNodeBuilder):
Attributes
----------
dataset : str | DictConfig
The path to txt file containing the coordinates of the nodes.
dataset : str | Path
The path including filename to txt file containing the coordinates of the nodes.
idx_lon : int
The index of the longitude in the dataset.
idx_lat : int
The index of the latitude in the dataset.
"""

def __init__(self, dataset, name: str, idx_lon: int = 0, idx_lat: int = 1) -> None:
def __init__(self, dataset: str | Path, name: str, idx_lon: int = 0, idx_lat: int = 1) -> None:
LOGGER.info("Reading the dataset from %s.", dataset)
self.dataset = dataset
self.idx_lon = idx_lon
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ def __init__(
Graph definition
"""
super().__init__()

model_config = DotDict(model_config)
self._graph_data = graph_data
self._graph_name_data = model_config.graph.data
self._graph_name_hidden = model_config.graph.hidden
Expand Down
8 changes: 8 additions & 0 deletions training/docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@
"sphinx.ext.napoleon",
"sphinxarg.ext",
"sphinx.ext.autosectionlabel",
"sphinxcontrib.autodoc_pydantic",
]

# Add any paths that contain templates here, relative to this directory.
Expand Down Expand Up @@ -133,3 +134,10 @@
todo_include_todos = not read_the_docs_build

autodoc_member_order = "bysource" # Keep file order


# https://autodoc-pydantic.readthedocs.io/en/stable/users/configuration.html

autodoc_pydantic_model_show_json = True
autodoc_pydantic_model_show_field_summary = False
autodoc_pydantic_model_member_order = "bysource"
73 changes: 69 additions & 4 deletions training/docs/dev/hydra.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,64 @@
Configuration
###############

Anemoi Training uses Hydra for configuration management, allowing for
flexible and modular configuration of the training pipeline. This guide
explains how to use Hydra effectively in the project.
Anemoi Training uses Hydra and Pydantic for configuration management,
allowing for flexible and modular configuration of the training pipeline
while provide robustness through validation. This guide explains how to
use Hydra and Pydantic effectively in the project.

***************************************
Pydantic and Configuration Validation
***************************************

Pydantic is a package designed for data validation and settings
management. It provides a simple way to define schemas which can be used
to validate configuration files. For example, the following schema can
be used to validate a training configuration:

.. code:: python
from pydantic import BaseModel, Field, PositiveFloat, Literal
class TrainingSchema(BaseModel):
model: Literal{"AlexNet", "ResNet", "VGG"} = Field(default="AlexNet")
"""Model architecture to use for training."""
learning_rate: PositiveFloat = Field(default=0.01)
"""Learning rate."""
loss: str = Field(default="mse")
"""Loss function."""
To allow more complex configurations, Pydantic also supports nested
schemas. For example, the following schema can be used to validate a
configuration with a configurable model:

.. code:: python
from pydantic import BaseModel, Field, PositiveFloat, Literal
from enum import StrEnum
class ActivationFunctions(StrEnum):
relu = "relu"
sigmoid = "sigmoid"
tanh = "tanh"
class ModelSchema(BaseModel):
num_layers: PositiveInt = Field(default=3)
"""Number of layers in the model."""
activation: ActivationFunctions = Field(default="relu")
"""Activation function to use."""
class TrainingSchema(BaseModel):
model: ModelSchema
"""Model configuration."""
learning_rate: PositiveFloat = Field(default=0.01)
"""Learning rate."""
loss: str = Field(default="mse")
"""Loss function."""
If your new feature requires a new configuration parameter, you should
add it to the appropriate schemas and update the configuration files
accordingly.

**************
Hydra Basics
Expand Down Expand Up @@ -48,13 +103,23 @@ Configuration in YAML:
algorithm: SGD
learning_rate: 0.01
Pydantic schema:

.. code:: python
from pydantic import BaseModel
class OptimizerSchema(BaseModel):
algorithm: str
learning_rate: float
Instantiating in code:

.. code:: python
from hydra.utils import instantiate
optimizer = instantiate(config.optimizer)
optimizer = instantiate(config.optimizer.model_dump())
********************************************
Configurable Components in Anemoi Training
Expand Down
67 changes: 67 additions & 0 deletions training/docs/modules/schemas.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
#########
Schemas
#########

This module defines pydantic schemas, which are used to validate the
configuration before a training run is started. The top-level config
yaml matches the BaseSchema.

.. automodule:: anemoi.training.schemas.base_schema
:members:
:no-undoc-members:
:show-inheritance:

.. automodule:: anemoi.training.schemas.data
:members:
:no-undoc-members:
:show-inheritance:

.. automodule:: anemoi.training.schemas.dataloader
:members:
:no-undoc-members:
:show-inheritance:

.. automodule:: anemoi.training.schemas.hardware
:members:
:no-undoc-members:
:show-inheritance:

.. automodule:: anemoi.training.schemas.graphs.basegraph
:members:
:no-undoc-members:
:show-inheritance:

.. automodule:: anemoi.training.schemas.graphs.node_schemas
:members:
:no-undoc-members:
:show-inheritance:

.. automodule:: anemoi.training.schemas.graphs.edge_schemas
:members:
:no-undoc-members:
:show-inheritance:

.. automodule:: anemoi.training.schemas.models.models
:members:
:no-undoc-members:
:show-inheritance:

.. automodule:: anemoi.training.schemas.models.processor
:members:
:no-undoc-members:
:show-inheritance:

.. automodule:: anemoi.training.schemas.models.encoder
:members:
:no-undoc-members:
:show-inheritance:

.. automodule:: anemoi.training.schemas.models.decoder
:members:
:no-undoc-members:
:show-inheritance:

.. automodule:: anemoi.training.schemas.training
:members:
:no-undoc-members:
:show-inheritance:
8 changes: 8 additions & 0 deletions training/docs/start/hydra-intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -86,3 +86,11 @@ The following missing config options which must be overridden by users:
- ``hardware.files.graph``: If you have pre-computed a specific graph,
specify its filename here. Otherwise, a new graph will be constructed
on the fly and written to the filename given.

*********************************
Validation of the Configuration
*********************************

The configuration is validated using `Pydantic` before a training run
starts. To turn this off, you can use the `--no-validation` flag in your
top-level config.
Loading

0 comments on commit 1f7812b

Please sign in to comment.