Skip to content

Commit

Permalink
fix : index.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ltroussellier committed Jan 29, 2025
2 parents 018d6e8 + c92b0f8 commit 21ae1b7
Show file tree
Hide file tree
Showing 9 changed files with 342 additions and 10 deletions.
5 changes: 3 additions & 2 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,11 @@
'sphinx.ext.autosummary',
'sphinx.ext.linkcode',
'sphinx.ext.intersphinx',
'myst_nb'
'myst_nb',
'sphinx_tabs.tabs',
'sphinx_copybutton'
]


def linkcode_resolve(domain, info):
if domain != "py":
return None
Expand Down
17 changes: 13 additions & 4 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,14 +75,23 @@ The DRS applications are based on the functionalities described above. They prov
convenient way to check DRS expressions of a project (directory, dataset id and file name)
and also generate expressions from mappings of collections and terms or an
unordered bag of terms.
=======

```{toctree}
:caption: welcome
user/introduction.md
user/terms.md
user/cached_database.md
user/api.md
user/cli.md
```

```{toctree}
:caption: Guides
:hidden:
guides/get_started.md
guides/terms.md
guides/basic_cli.md
guides/basics_esgvoc.ipynb
guides/basics_drs.ipynb
Expand All @@ -96,4 +105,4 @@ api_documentation/universe.md
api_documentation/projects.md
api_documentation/project_specs.md
api_documentation/drs.md
```
```
88 changes: 88 additions & 0 deletions docs/source/user/api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Python API

## Overview

The `esgf-vocab` Python API provides a powerful and read-only interface to query controlled vocabularies. Users can retrieve, search, and validate vocabulary terms programmatically using a clean and intuitive API.

## Key Features

The API offers three main types of functions:

1. **Retrieve Functions (`get_*`)**
- Fetch collections, terms, or data descriptors.
- Examples:
- Retrieve all data descriptors in the Universe CV.
- Retrieve all terms within a specific collection or project.

2. **Search Functions (`find_*`)**
- Search for terms based on input strings.
- Examples:
- Search for a specific term in a data descriptor.
- Search for terms within a project or collection.

3. **Validation Functions (`valid_*`)**
- Validate the compliance of an input string with controlled vocabulary rules.
- Examples:
- Check if a term is valid within a collection.
- Check if a term is valid within a project.

## Example Usage

Below are some examples of how to use the API. For complete documentation, refer to the [API Reference](#) or the [Notebook Guide](https://esgf.github.io/esgf-vocab/guides/basics_esgvoc.html).

```python
import esgvoc.api as ev

# Retrieve Functions
ev.get_all_data_descriptors_in_universe()
ev.get_all_terms_in_data_descriptor(data_descriptor_id="activity")

ev.get_all_projects()
ev.get_all_collections_in_project(project_id="cmip6plus")
ev.get_all_terms_in_collection(project_id="cmip6plus", collection_id="activity_id")

# Search Functions
ev.find_terms_in_data_descriptor(data_descriptor_id="activity", term_id="aerchemmip")
ev.find_terms_in_universe(term_id="aerchemmip")
ev.find_terms_in_collection(project_id="cmip6plus", collection_id="activity_id", term_id="cmip")

# Validation Functions
ev.valid_term_in_collection(value="ipsl", project_id="cmip6plus", collection_id="institution_id")
ev.valid_term_in_project(value="some_term", project_id="cmip6plus")
```

## Structured Data with Pydantic Models

One of the key benefits of using this library is that the returned terms are Pydantic objects representing the requested terms. This provides several advantages:

### Structured Data
Each term is encapsulated in a well-defined Pydantic model, ensuring that the data is structured and adheres to a defined schema.

### Ease of Integration
Since Pydantic objects are Python-native and compatible with many frameworks, the terms can be seamlessly integrated into third-party software, such as:
- **Web Frameworks**: Using terms directly in APIs or web applications (e.g., FastAPI, Django).
- **Data Pipelines**: Injecting validated terms into ETL workflows or analytics systems.
- **Configuration Management**: Mapping terms into application configurations or schemas.

## Notes

- **Read-Only Access**: The API does not allow modification of the controlled vocabularies. Changes must be made in the respective GitHub repositories.
- **Tabulated Examples**: Below is a summary of API functionality with example commands:

```{note}
```{tabs}
```{tab} Retrieve
ev.get_all_data_descriptors_in_universe()
ev.get_all_projects()
```
```{tab} Find
ev.find_terms_in_universe(term_id="aerchemmip")
ev.find_terms_in_collection(project_id="cmip6plus", collection_id="activity_id", term_id="cmip")
```
```{tab} Validate
ev.valid_term_in_collection(value="ipsl", project_id="cmip6plus", collection_id="institution_id")
```
```
```
36 changes: 36 additions & 0 deletions docs/source/user/cached_database.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@

# Database and Functionality

## Overview of Database Management

The `esgvoc install` command is responsible for managing the synchronization process between the controlled vocabulary repositories, the local file system, and the SQLite database. This ensures that the database always reflects the most up-to-date state of the repositories.

### Synchronization Workflow
For each configured controlled vocabulary repository, the synchronization process involves the following steps:

1. **Version Check**
- The software determines the current version of each controlled vocabulary by examining the latest commit in three locations:
- The remote GitHub repository.(if internet access)
- The locally cloned repository (if it exists).
- The cached SQLite database (if it exists)
.

2. **Database Update Process**
- The goal of the synchronization process is to ensure the database reflects the most recent version of the controlled vocabulary.
- If the local repository does not exist or is outdated, the software clones or updates the repository from GitHub.
- The SQLite database is rebuilt or updated as necessary to match the most recent version of the controlled vocabulary.

By automating this process, `esgf-vocab` guarantees that users have access to the latest available controlled vocabularies without requiring manual intervention.

### Viewing Synchronization Status
The `esgvoc status` command provides a detailed summary of the current state of synchronization for each controlled vocabulary repository. Specifically, it shows:
- The latest commit version of the remote GitHub repository.
- The latest commit version of the local repository.
- The version of the cached SQLite database.

This information helps users understand whether any part of the system is outdated and requires synchronization.

---

The intended purpose of these databases is to provide an efficient and rapid query system, accessible exclusively through the API or the CLI.

111 changes: 111 additions & 0 deletions docs/source/user/cli.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@

# Typer CLI

## Overview

The `esgf-vocab` CLI provides a command-line interface for querying and validating controlled vocabularies. It serves as an alternative to the Python API, offering a simple and readable way to access the library's functionality directly from the terminal.

## Installation and Access

- The CLI is included as part of the `esgf-vocab` Python library. Once the library is installed, the CLI becomes available.
- To view available commands and usage information, use the help command:

```bash
esgvoc --help
```

## Command Syntax

The CLI uses a structured syntax based on a colon-separated format to define queries. For example:

```bash
esgvoc get cmip6plus:institution_id:ipsl
```

### Understanding `::`
- A double colon `::` represents "universe" as the first empty string (`""`) and "all" for subsequent entries. For example:
- `esgvoc get ::` retrieves all data descriptors in the Universe CV.
- `esgvoc get cmip6plus::` retrieves all collections for the project `cmip6plus`.

### Case Sensitivity
- **Term IDs are case-sensitive.** For example:
- `ipsl` (lowercase) refers to the term's ID.
- `IPSL` (capitalized) refers to the term's `drs_name` (a standardized representation).


- **What can be queried**:
- You can query `ID` attributes directly, such as `ipsl` or `cmip6plus`.
- The CLI does not support direct queries for non-ID attributes like `drs_name` (e.g., `IPSL`). These attributes are available in the returned data for informational and validating purposes but cannot be used as query keys.

## Basic Commands

### Querying Data

The `get` command retrieves data from the vocabulary database. Examples:

```bash
# Retrieve all data descriptors in the Universe CV
esgvoc get ::

# Retrieve all terms in the "institution" data descriptor
esgvoc get universe:institution:

# Retrieve a specific term in the "institution" data descriptor
esgvoc get universe:institution:ipsl

# Retrieve all collections in a project
esgvoc get cmip6plus::

# Retrieve a specific term in a collection
esgvoc get cmip6plus:institution_id:ipsl

# Retrieve multiple terms
esgvoc get cmip6plus:institution_id:ipsl cmip6plus:institution_id:llnl
```

### Validating Data

The `valid` command checks if a string comply with the `drs_name` of a term vocabulary rules. Examples:

```bash
# Validate a term in the Universe CV
esgvoc valid IPSL ::

# Validate a term in a specific project
esgvoc valid IPSL cmip6plus::

# Validate a term in a specific collection
esgvoc valid IPSL cmip6plus:institution_id:

# Validate multiple terms
esgvoc valid IPSL cmip6plus:institution_id:ipsl
```

## Features and Limitations

### Filtering and Querying
- **Wildcard Patterns and Regular Expressions**: Not currently supported in the CLI but available in the Python API.
- **Attribute-Based Filtering**: Planned for future releases if required.

### Error Handling
- If a term does not exist or is invalid, the CLI provides a clear error or warning in the output table.

### Batch Operations
- The CLI supports batch validation or querying by specifying multiple terms in a single command.

## Advanced Use Cases

- **Scripting and Automation**:
- The CLI supports automation and can be integrated into shell scripts.
- Results can be piped to other commands or saved to files for further processing.

- **Integration with CI/CD**:
- The CLI can be included in pipelines to validate terms or verify controlled vocabulary compliance automatically.

## Summary

The `esgf-vocab` CLI is a lightweight, flexible tool for interacting with controlled vocabularies. It complements the Python API by providing:
- A simple interface for quick queries and validations.
- Readable output for immediate consumption or integration.

For more advanced queries or functionality, the Python API is recommended.
82 changes: 82 additions & 0 deletions docs/source/user/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@

# Introduction to ESGF-Vocab

`esgf-vocab` is a Python library designed to streamline and improve the management of controlled vocabularies used by the Earth System Grid Federation (ESGF) and related projects. By harmonizing data sources and providing both a Python API and a CLI for easy access, `esgf-vocab` resolves common issues like inconsistencies, errors, and inefficiencies associated with managing controlled vocabularies.

## Why ESGF-Vocab?

### Challenges with Traditional Methods
Previously, controlled vocabularies were stored in multiple locations and formats, requiring various software implementations to query and interpret data. This approach introduced challenges, including:

- Errors and inconsistencies across systems.
- Misuse of metadata and data.
- Difficulty in maintaining and updating vocabularies.

### The ESGF-Vocab Solution
`esgf-vocab` improves controlled vocabulary management through two main ideas:

1. **Harmonization through a Unified Source**
A single, centralized repository — referred to as the "Universe CV" — hosts all controlled vocabularies. Specialized vocabularies for specific projects reference the Universe CV via streamlined lists of IDs. This ensures consistency and eliminates duplication.

2. **A Controlled Vocabulary Library**
`esgf-vocab` provides a dedicated service for interacting with controlled vocabularies. It enables developers, administrators, and software systems to access vocabularies seamlessly via:
- A Python API for programmatic interaction.
- A CLI powered by [Typer](https://typer.tiangolo.com/) for command-line use.

## Installation

You can install `esgf-vocab` using modern Python packaging tools or in a virtual environment. Below are the recommended methods:

### Using Rye (Preferred)

[Rye](https://rye-up.com/) is recommended for managing dependencies and isolating the library:

```bash
rye add esgvoc
```

This ensures all dependencies are installed, and cached repositories and databases will be stored in the `.cache` directory alongside the `.venv` folder. This approach simplifies updates and uninstallation.

### Using pip in a Virtual Environment
Alternatively, you can use a virtual environment:

```bash
python -m venv myenv
source myenv/bin/activate
pip install esgvoc
```

## Fetching Vocabulary Data

Once installed, you can fetch controlled vocabulary data using the following command:

```bash
esgvoc install
```

This command performs the following actions:
- Clones the official repositories.
- Builds a cached SQLite database from the cloned data.

### Offline Use
If there is no internet access, `esgvoc install` will check the `.cache` directory for existing repositories. You can manually copy the repositories into `.cache` to use the library offline.

## Official Controlled Vocabulary Repositories

`esgf-vocab` primarily uses the following repositories for controlled vocabulary data:

- **Universe CV**: [GitHub Repository](https://github.com/WCRP-CMIP/WCRP-universe/tree/esgvoc)
- **CMIP6 CVs**: [GitHub Repository](https://github.com/WCRP-CMIP/CMIP6_CVs/tree/esgvoc)
- **CMIP6Plus CVs**: [GitHub Repository](https://github.com/WCRP-CMIP/CMIP6Plus_CVs/tree/esgvoc)

### Flexibility for Other Repositories
While designed for these official repositories, `esgf-vocab` can use other repositories if they are structured correctly.

## Requirements

- **Python Version**: 3.12 or higher.
- **No Additional System Dependencies**: Interaction with the library is entirely Python-based, with no external SQLite dependencies.

---

This introduction covers the general purpose and installation of `esgf-vocab`. In the next sections, we will dive deeper into its functionality, including the Python API and CLI usage.
File renamed without changes.
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ dependencies = [
"requests>=2.32.3",
"toml>=0.10.2",
"typer>=0.15.0",
"sphinx-tabs>=3.4.7",
"sphinx-copybutton>=0.5.2",
]
readme = "README.md"
requires-python = ">= 3.12, <3.13"
Expand Down
Loading

0 comments on commit 21ae1b7

Please sign in to comment.