-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
9 changed files
with
342 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
# Python API | ||
|
||
## Overview | ||
|
||
The `esgf-vocab` Python API provides a powerful and read-only interface to query controlled vocabularies. Users can retrieve, search, and validate vocabulary terms programmatically using a clean and intuitive API. | ||
|
||
## Key Features | ||
|
||
The API offers three main types of functions: | ||
|
||
1. **Retrieve Functions (`get_*`)** | ||
- Fetch collections, terms, or data descriptors. | ||
- Examples: | ||
- Retrieve all data descriptors in the Universe CV. | ||
- Retrieve all terms within a specific collection or project. | ||
|
||
2. **Search Functions (`find_*`)** | ||
- Search for terms based on input strings. | ||
- Examples: | ||
- Search for a specific term in a data descriptor. | ||
- Search for terms within a project or collection. | ||
|
||
3. **Validation Functions (`valid_*`)** | ||
- Validate the compliance of an input string with controlled vocabulary rules. | ||
- Examples: | ||
- Check if a term is valid within a collection. | ||
- Check if a term is valid within a project. | ||
|
||
## Example Usage | ||
|
||
Below are some examples of how to use the API. For complete documentation, refer to the [API Reference](#) or the [Notebook Guide](https://esgf.github.io/esgf-vocab/guides/basics_esgvoc.html). | ||
|
||
```python | ||
import esgvoc.api as ev | ||
|
||
# Retrieve Functions | ||
ev.get_all_data_descriptors_in_universe() | ||
ev.get_all_terms_in_data_descriptor(data_descriptor_id="activity") | ||
|
||
ev.get_all_projects() | ||
ev.get_all_collections_in_project(project_id="cmip6plus") | ||
ev.get_all_terms_in_collection(project_id="cmip6plus", collection_id="activity_id") | ||
|
||
# Search Functions | ||
ev.find_terms_in_data_descriptor(data_descriptor_id="activity", term_id="aerchemmip") | ||
ev.find_terms_in_universe(term_id="aerchemmip") | ||
ev.find_terms_in_collection(project_id="cmip6plus", collection_id="activity_id", term_id="cmip") | ||
|
||
# Validation Functions | ||
ev.valid_term_in_collection(value="ipsl", project_id="cmip6plus", collection_id="institution_id") | ||
ev.valid_term_in_project(value="some_term", project_id="cmip6plus") | ||
``` | ||
|
||
## Structured Data with Pydantic Models | ||
|
||
One of the key benefits of using this library is that the returned terms are Pydantic objects representing the requested terms. This provides several advantages: | ||
|
||
### Structured Data | ||
Each term is encapsulated in a well-defined Pydantic model, ensuring that the data is structured and adheres to a defined schema. | ||
|
||
### Ease of Integration | ||
Since Pydantic objects are Python-native and compatible with many frameworks, the terms can be seamlessly integrated into third-party software, such as: | ||
- **Web Frameworks**: Using terms directly in APIs or web applications (e.g., FastAPI, Django). | ||
- **Data Pipelines**: Injecting validated terms into ETL workflows or analytics systems. | ||
- **Configuration Management**: Mapping terms into application configurations or schemas. | ||
|
||
## Notes | ||
|
||
- **Read-Only Access**: The API does not allow modification of the controlled vocabularies. Changes must be made in the respective GitHub repositories. | ||
- **Tabulated Examples**: Below is a summary of API functionality with example commands: | ||
|
||
```{note} | ||
```{tabs} | ||
```{tab} Retrieve | ||
ev.get_all_data_descriptors_in_universe() | ||
ev.get_all_projects() | ||
``` | ||
```{tab} Find | ||
ev.find_terms_in_universe(term_id="aerchemmip") | ||
ev.find_terms_in_collection(project_id="cmip6plus", collection_id="activity_id", term_id="cmip") | ||
``` | ||
```{tab} Validate | ||
ev.valid_term_in_collection(value="ipsl", project_id="cmip6plus", collection_id="institution_id") | ||
``` | ||
``` | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
|
||
# Database and Functionality | ||
|
||
## Overview of Database Management | ||
|
||
The `esgvoc install` command is responsible for managing the synchronization process between the controlled vocabulary repositories, the local file system, and the SQLite database. This ensures that the database always reflects the most up-to-date state of the repositories. | ||
|
||
### Synchronization Workflow | ||
For each configured controlled vocabulary repository, the synchronization process involves the following steps: | ||
|
||
1. **Version Check** | ||
- The software determines the current version of each controlled vocabulary by examining the latest commit in three locations: | ||
- The remote GitHub repository.(if internet access) | ||
- The locally cloned repository (if it exists). | ||
- The cached SQLite database (if it exists) | ||
. | ||
|
||
2. **Database Update Process** | ||
- The goal of the synchronization process is to ensure the database reflects the most recent version of the controlled vocabulary. | ||
- If the local repository does not exist or is outdated, the software clones or updates the repository from GitHub. | ||
- The SQLite database is rebuilt or updated as necessary to match the most recent version of the controlled vocabulary. | ||
|
||
By automating this process, `esgf-vocab` guarantees that users have access to the latest available controlled vocabularies without requiring manual intervention. | ||
|
||
### Viewing Synchronization Status | ||
The `esgvoc status` command provides a detailed summary of the current state of synchronization for each controlled vocabulary repository. Specifically, it shows: | ||
- The latest commit version of the remote GitHub repository. | ||
- The latest commit version of the local repository. | ||
- The version of the cached SQLite database. | ||
|
||
This information helps users understand whether any part of the system is outdated and requires synchronization. | ||
|
||
--- | ||
|
||
The intended purpose of these databases is to provide an efficient and rapid query system, accessible exclusively through the API or the CLI. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
|
||
# Typer CLI | ||
|
||
## Overview | ||
|
||
The `esgf-vocab` CLI provides a command-line interface for querying and validating controlled vocabularies. It serves as an alternative to the Python API, offering a simple and readable way to access the library's functionality directly from the terminal. | ||
|
||
## Installation and Access | ||
|
||
- The CLI is included as part of the `esgf-vocab` Python library. Once the library is installed, the CLI becomes available. | ||
- To view available commands and usage information, use the help command: | ||
|
||
```bash | ||
esgvoc --help | ||
``` | ||
|
||
## Command Syntax | ||
|
||
The CLI uses a structured syntax based on a colon-separated format to define queries. For example: | ||
|
||
```bash | ||
esgvoc get cmip6plus:institution_id:ipsl | ||
``` | ||
|
||
### Understanding `::` | ||
- A double colon `::` represents "universe" as the first empty string (`""`) and "all" for subsequent entries. For example: | ||
- `esgvoc get ::` retrieves all data descriptors in the Universe CV. | ||
- `esgvoc get cmip6plus::` retrieves all collections for the project `cmip6plus`. | ||
|
||
### Case Sensitivity | ||
- **Term IDs are case-sensitive.** For example: | ||
- `ipsl` (lowercase) refers to the term's ID. | ||
- `IPSL` (capitalized) refers to the term's `drs_name` (a standardized representation). | ||
|
||
|
||
- **What can be queried**: | ||
- You can query `ID` attributes directly, such as `ipsl` or `cmip6plus`. | ||
- The CLI does not support direct queries for non-ID attributes like `drs_name` (e.g., `IPSL`). These attributes are available in the returned data for informational and validating purposes but cannot be used as query keys. | ||
|
||
## Basic Commands | ||
|
||
### Querying Data | ||
|
||
The `get` command retrieves data from the vocabulary database. Examples: | ||
|
||
```bash | ||
# Retrieve all data descriptors in the Universe CV | ||
esgvoc get :: | ||
|
||
# Retrieve all terms in the "institution" data descriptor | ||
esgvoc get universe:institution: | ||
|
||
# Retrieve a specific term in the "institution" data descriptor | ||
esgvoc get universe:institution:ipsl | ||
|
||
# Retrieve all collections in a project | ||
esgvoc get cmip6plus:: | ||
|
||
# Retrieve a specific term in a collection | ||
esgvoc get cmip6plus:institution_id:ipsl | ||
|
||
# Retrieve multiple terms | ||
esgvoc get cmip6plus:institution_id:ipsl cmip6plus:institution_id:llnl | ||
``` | ||
|
||
### Validating Data | ||
|
||
The `valid` command checks if a string comply with the `drs_name` of a term vocabulary rules. Examples: | ||
|
||
```bash | ||
# Validate a term in the Universe CV | ||
esgvoc valid IPSL :: | ||
|
||
# Validate a term in a specific project | ||
esgvoc valid IPSL cmip6plus:: | ||
|
||
# Validate a term in a specific collection | ||
esgvoc valid IPSL cmip6plus:institution_id: | ||
|
||
# Validate multiple terms | ||
esgvoc valid IPSL cmip6plus:institution_id:ipsl | ||
``` | ||
|
||
## Features and Limitations | ||
|
||
### Filtering and Querying | ||
- **Wildcard Patterns and Regular Expressions**: Not currently supported in the CLI but available in the Python API. | ||
- **Attribute-Based Filtering**: Planned for future releases if required. | ||
|
||
### Error Handling | ||
- If a term does not exist or is invalid, the CLI provides a clear error or warning in the output table. | ||
|
||
### Batch Operations | ||
- The CLI supports batch validation or querying by specifying multiple terms in a single command. | ||
|
||
## Advanced Use Cases | ||
|
||
- **Scripting and Automation**: | ||
- The CLI supports automation and can be integrated into shell scripts. | ||
- Results can be piped to other commands or saved to files for further processing. | ||
|
||
- **Integration with CI/CD**: | ||
- The CLI can be included in pipelines to validate terms or verify controlled vocabulary compliance automatically. | ||
|
||
## Summary | ||
|
||
The `esgf-vocab` CLI is a lightweight, flexible tool for interacting with controlled vocabularies. It complements the Python API by providing: | ||
- A simple interface for quick queries and validations. | ||
- Readable output for immediate consumption or integration. | ||
|
||
For more advanced queries or functionality, the Python API is recommended. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
|
||
# Introduction to ESGF-Vocab | ||
|
||
`esgf-vocab` is a Python library designed to streamline and improve the management of controlled vocabularies used by the Earth System Grid Federation (ESGF) and related projects. By harmonizing data sources and providing both a Python API and a CLI for easy access, `esgf-vocab` resolves common issues like inconsistencies, errors, and inefficiencies associated with managing controlled vocabularies. | ||
|
||
## Why ESGF-Vocab? | ||
|
||
### Challenges with Traditional Methods | ||
Previously, controlled vocabularies were stored in multiple locations and formats, requiring various software implementations to query and interpret data. This approach introduced challenges, including: | ||
|
||
- Errors and inconsistencies across systems. | ||
- Misuse of metadata and data. | ||
- Difficulty in maintaining and updating vocabularies. | ||
|
||
### The ESGF-Vocab Solution | ||
`esgf-vocab` improves controlled vocabulary management through two main ideas: | ||
|
||
1. **Harmonization through a Unified Source** | ||
A single, centralized repository — referred to as the "Universe CV" — hosts all controlled vocabularies. Specialized vocabularies for specific projects reference the Universe CV via streamlined lists of IDs. This ensures consistency and eliminates duplication. | ||
|
||
2. **A Controlled Vocabulary Library** | ||
`esgf-vocab` provides a dedicated service for interacting with controlled vocabularies. It enables developers, administrators, and software systems to access vocabularies seamlessly via: | ||
- A Python API for programmatic interaction. | ||
- A CLI powered by [Typer](https://typer.tiangolo.com/) for command-line use. | ||
|
||
## Installation | ||
|
||
You can install `esgf-vocab` using modern Python packaging tools or in a virtual environment. Below are the recommended methods: | ||
|
||
### Using Rye (Preferred) | ||
|
||
[Rye](https://rye-up.com/) is recommended for managing dependencies and isolating the library: | ||
|
||
```bash | ||
rye add esgvoc | ||
``` | ||
|
||
This ensures all dependencies are installed, and cached repositories and databases will be stored in the `.cache` directory alongside the `.venv` folder. This approach simplifies updates and uninstallation. | ||
|
||
### Using pip in a Virtual Environment | ||
Alternatively, you can use a virtual environment: | ||
|
||
```bash | ||
python -m venv myenv | ||
source myenv/bin/activate | ||
pip install esgvoc | ||
``` | ||
|
||
## Fetching Vocabulary Data | ||
|
||
Once installed, you can fetch controlled vocabulary data using the following command: | ||
|
||
```bash | ||
esgvoc install | ||
``` | ||
|
||
This command performs the following actions: | ||
- Clones the official repositories. | ||
- Builds a cached SQLite database from the cloned data. | ||
|
||
### Offline Use | ||
If there is no internet access, `esgvoc install` will check the `.cache` directory for existing repositories. You can manually copy the repositories into `.cache` to use the library offline. | ||
|
||
## Official Controlled Vocabulary Repositories | ||
|
||
`esgf-vocab` primarily uses the following repositories for controlled vocabulary data: | ||
|
||
- **Universe CV**: [GitHub Repository](https://github.com/WCRP-CMIP/WCRP-universe/tree/esgvoc) | ||
- **CMIP6 CVs**: [GitHub Repository](https://github.com/WCRP-CMIP/CMIP6_CVs/tree/esgvoc) | ||
- **CMIP6Plus CVs**: [GitHub Repository](https://github.com/WCRP-CMIP/CMIP6Plus_CVs/tree/esgvoc) | ||
|
||
### Flexibility for Other Repositories | ||
While designed for these official repositories, `esgf-vocab` can use other repositories if they are structured correctly. | ||
|
||
## Requirements | ||
|
||
- **Python Version**: 3.12 or higher. | ||
- **No Additional System Dependencies**: Interaction with the library is entirely Python-based, with no external SQLite dependencies. | ||
|
||
--- | ||
|
||
This introduction covers the general purpose and installation of `esgf-vocab`. In the next sections, we will dive deeper into its functionality, including the Python API and CLI usage. |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.