From 14e6994daf5fcd259f57a0668ac0b06bf936e352 Mon Sep 17 00:00:00 2001 From: zhu0619 Date: Fri, 2 Aug 2024 17:53:20 +0000 Subject: [PATCH] Deployed 3b8792a to main with MkDocs 1.6.0 and mike 2.1.2 --- main/index.html | 21 ++++++++++++++++++++- main/search/search_index.json | 2 +- 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/main/index.html b/main/index.html index 03f1500..c111c4a 100644 --- a/main/index.html +++ b/main/index.html @@ -964,7 +964,26 @@

Introduction

Welcome to the Auroris - Simplifying Drug Discovery Data Curation


What is Auroris?

-

Auroris is a comprehensive Python library designed to assist researchers and scientists in managing, cleaning, and preparing data relevant to drug discovery. Our mission is to implement a range of techniques to handle, transform, filter, analyze, or visualize the diverse data types commonly encountered in drug discovery.

+

Auroris is a Python library designed to assist researchers and scientists in managing, cleaning, and preparing data relevant to drug discovery. Auroris will implement a range of techniques to handle, transform, filter, analyze, or visualize the diverse data types commonly encountered in drug discovery.

+

Currently, Auroris supports curation for small molecules, with plans to extend to other modalities in drug discovery. The curation module for small molecules includes:

+ +

Reproducibility and transparency are core to the mission of Polaris. That’s why with Auroris, you can also automatically generate detailed reports summarizing the changes that happened to a dataset during curation. Through an intuitive API, you can easily define complex curation workflows. Once defined, that workflow is serializable and thus reproducible so you can transparently share how you curated the dataset.

Where to next?


Quickstart

diff --git a/main/search/search_index.json b/main/search/search_index.json index ae9879b..eb3c2db 100644 --- a/main/search/search_index.json +++ b/main/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"index.html","title":"Introduction","text":"

Welcome to the Auroris - Simplifying Drug Discovery Data Curation

"},{"location":"index.html#what-is-auroris","title":"What is Auroris?","text":"

Auroris is a comprehensive Python library designed to assist researchers and scientists in managing, cleaning, and preparing data relevant to drug discovery. Our mission is to implement a range of techniques to handle, transform, filter, analyze, or visualize the diverse data types commonly encountered in drug discovery.

"},{"location":"index.html#where-to-next","title":"Where to next?","text":"

Quickstart

Dive deeper into the Auroris code and learn how to curate data for your ML-powered drug discovery program.

Let's get started

API Reference

Explore the technical documentation here to delve into the inner workings of the code. Gain insights into the intricate details of how different methods and classes function.

Let's get started

Community

We're excited to have you join us in revolutionizing drug discovery data curation! Explore Auroris and the broader Polaris ecosystem it is part of, provide feedback, share your use cases, and collaborate with us to enhance and expand the capabilities of Auroris for the benefit of the drug discovery community.

Let's get started

"},{"location":"api/actions.html","title":"Actions","text":""},{"location":"api/actions.html#auroris.curation.actions.BaseAction","title":"auroris.curation.actions.BaseAction","text":"

Bases: BaseModel, ABC

An action in the curation process.

The importance of reproducibility

One of the main goals in designing auroris is to make it easy to reproduce the curation process. Reproducibility is key to scientific research. This is why a BaseAction needs to be serializable and uniquely identified by a name.

Attributes:

Name Type Description name str

The name that uniquely identifies the action. This is used to serialize and deserialize the action.

prefix str

This prefix is used when an action adds columns to a dataset. If not set, it defaults to the name in uppercase.

"},{"location":"api/actions.html#auroris.curation.actions.StereoIsomerACDetection","title":"StereoIsomerACDetection","text":"

Bases: BaseAction

Automatic detection of activity shift between stereoisomers.

See auroris.curation.functional.detect_streoisomer_activity_cliff for the docs of the stereoisomer_id_col, y_cols and threshold attributes

Attributes:

Name Type Description mol_col Optional[str]

Column with the SMILES or RDKit Molecule objects. If specified, will be used to render an image for the activity cliffs.

"},{"location":"api/actions.html#auroris.curation.actions.Deduplication","title":"Deduplication","text":"

Bases: BaseAction

Automatic detection of outliers.

See auroris.curation.functional.deduplicate for the docs of the deduplicate_on, y_cols, keep and method attributes

"},{"location":"api/actions.html#auroris.curation.actions.Discretization","title":"Discretization","text":"

Bases: BaseAction

Thresholding bioactivity columns to binary or multiclass labels.

See auroris.curation.functional.discretize for the docs of the thresholds, inplace, allow_nan and label_order attributes

Attributes:

Name Type Description input_column str

The column to discretize.

log_scale bool

Whether a visual depiction of the discretization should be on a log scale.

"},{"location":"api/actions.html#auroris.curation.actions.ContinuousDistributionVisualization","title":"ContinuousDistributionVisualization","text":"

Bases: BaseAction

Visualize one or more continuous distribution(s).

See auroris.visualization.visualize_continuous_distribution for the docs of the log_scale and bins attributes

Attributes:

Name Type Description y_cols List[str]

The columns whose distributions should be visualized.

"},{"location":"api/actions.html#auroris.curation.actions.MoleculeCuration","title":"MoleculeCuration","text":"

Bases: BaseAction

Automated molecule curation and chemistry space distribution.

See auroris.curation.functional.curate_molecules for the docs of the remove_stereo, fix_mol, count_stereoisomers, and count_stereocenters attributes

Attributes:

Name Type Description input_column str

The name of the column that has the molecules (either dm.Mol objects or SMILES).

X_col Optional[str]

Column with custom features for each of the molecules. If None, will use ECFP.

y_cols Optional[Union[str, List[str]]]

Column names for bioactivities, which will be used to colorcode the chemical space visualization.

"},{"location":"api/actions.html#auroris.curation.actions.OutlierDetection","title":"OutlierDetection","text":"

Bases: BaseAction

Automatic detection of outliers.

See auroris.curation.functional.detect_outliers for the docs of the method and kwargs attributes

Attributes:

Name Type Description columns List[str]

The columns for which to detect outliers.

"},{"location":"api/curator.html","title":"Curator","text":""},{"location":"api/curator.html#auroris.curation.Curator","title":"auroris.curation.Curator","text":"

Bases: BaseModel

A curator is a serializable collection of actions that are applied to a dataset.

Attributes:

Name Type Description steps List[BaseAction]

Ordered list of curation actions to apply to the dataset.

src_dataset_path Optional[str]

An optional path to load the source dataset from. Can be used to specify a reproducible workflow.

verbosity VerbosityLevel

Verbosity level for logging.

parallelized_kwargs dict

Keyword arguments to affect parallelization in the steps.

"},{"location":"api/curator.html#auroris.curation.Curator.transform","title":"transform","text":"
transform(dataset: Optional[pd.DataFrame] = None) -> Tuple[pd.DataFrame, CurationReport]\n

Runs the curation process.

Parameters:

Name Type Description Default dataset Optional[DataFrame]

The dataset to be curated. If src_dataset_path is set, this parameter is ignored.

None

Returns:

Type Description Tuple[DataFrame, CurationReport]

A tuple of the curated dataset and a report summarizing the changes made.

"},{"location":"api/curator.html#auroris.curation.Curator.load_dataset","title":"load_dataset staticmethod","text":"
load_dataset(path: str)\n

Loads a dataset, to be curated, from a path.

File-format support

This currently only supports CSV and Parquet files and uses the default parameters for pd.read_csv and pd.read_parquet. If you need more flexibility, consider loading the data yourself and passing it directly to Curator.transform(dataset=...).

"},{"location":"api/curator.html#auroris.curation.Curator.from_json","title":"from_json classmethod","text":"
from_json(path: str)\n

Loads a curation workflow from a JSON file.

Parameters:

Name Type Description Default path str

The path to load from

required"},{"location":"api/curator.html#auroris.curation.Curator.to_json","title":"to_json","text":"
to_json(path: str)\n

Saves the curation workflow to a JSON file.

Parameters:

Name Type Description Default path str

The destination to save to.

required"},{"location":"api/functional.html","title":"Curation","text":""},{"location":"api/functional.html#auroris.curation.functional.detect_streoisomer_activity_cliff","title":"detect_streoisomer_activity_cliff","text":"
detect_streoisomer_activity_cliff(dataset: pd.DataFrame, stereoisomer_id_col: str, y_cols: List[str], threshold: float = 2.0, prefix: str = 'AC_') -> pd.DataFrame\n

Detect activity cliff among stereoisomers based on classification label or pre-defined threshold for continuous values.

Parameters:

Name Type Description Default dataset DataFrame

Dataframe

required stereoisomer_id_col str

Column which identifies the stereoisomers

required y_cols List[str]

List of columns for bioactivities

required threshold float

Threshold to identify the activity cliff. Currently, the difference of zscores between isomers are used for identification.

2.0 prefix str

Prefix for the adding columns

'AC_'"},{"location":"api/functional.html#auroris.curation.functional.deduplicate","title":"deduplicate","text":"
deduplicate(dataset: pd.DataFrame, deduplicate_on: Optional[Union[str, List[str]]] = None, y_cols: Optional[Union[str, List[str]]] = None, keep: Literal['first', 'last'] = 'first', method: Literal['mean', 'median'] = 'median') -> pd.DataFrame\n

Deduplicate a dataframe.

If deduplicate_on specifies a subset of all columns in the dataset and y_cols specifies a set of non-overlapping columns, data will be grouped by deduplicate_on and the y_cols will be aggregated to a single value per group according to method.

Parameters:

Name Type Description Default dataset DataFrame

The dataset to deduplicate.

required deduplicate_on Optional[Union[str, List[str]]]

A subset of the columns to deduplicate on (can be default).

None y_cols Optional[Union[str, List[str]]]

The columns to aggregate.

None keep Literal['first', 'last']

Whether to keep the first or last copy of the duplicates.

'first' method Literal['mean', 'median']

The method to aggregate the data.

'median'"},{"location":"api/functional.html#auroris.curation.functional.discretize","title":"discretize","text":"
discretize(X: np.ndarray, thresholds: Union[np.ndarray, list], inplace: bool = False, allow_nan: bool = True, label_order: Literal['ascending', 'descending'] = 'ascending') -> np.ndarray\n

Thresholding of array-like or scipy.sparse matrix into binary or multiclass labels.

Parameters:

Name Type Description Default X

The data to discretize, element by element. scipy.sparse matrices should be in CSR or CSC format to avoid an un-necessary copy.

required thresholds Union[ndarray, list]

Interval boundaries that include the right bin edge.

required inplace bool

Set to True to perform inplace discretization and avoid a copy (if the input is already a numpy array or a scipy.sparse CSR / CSC matrix and if axis is 1).

False allow_nan bool

Set to True to allow nans in the array for discretization. Otherwise, an error will be raised instead.

True label_order Literal['ascending', 'descending']

The continuous values are discretized to labels 0, 1, 2, .., N with respect to given threshold bins [threshold_1, threshold_2,.., threshould_n]. When set to 'ascending', the class label is in ascending order with the threshold bins that 0 represents negative class or lower class, while 1, 2, 3 are for higher classes. When set to 'descending' the class label is in ascending order with the threshold bins. Sometimes the positive labels are on the left side of provided threshold. E.g. For binarization with threshold [0.5], the positive label is defined byX < 0.5. In this case, label_order should be descending.

'ascending'

Returns:

Name Type Description X_tr ndarray

The transformed data.

"},{"location":"api/functional.html#auroris.curation.functional.curate_molecules","title":"curate_molecules","text":"
curate_molecules(mols: List[Union[str, dm.Mol]], progress: bool = True, remove_stereo: bool = False, fix_mol: bool = True, count_stereoisomers: bool = True, count_stereocenters: bool = True, **parallelized_kwargs) -> Tuple\n

Curate a list of molecules.

Parameters:

Name Type Description Default mols List[Union[str, Mol]]

List of molecules.

required progress bool

Whether show curation progress.

True fix_mol bool

Whether fix the error in molecule.

True remove_stereo bool

Whether remove stereo chemistry information from molecule.

False count_stereoisomers bool

Whether count the number of stereoisomers of molecule.

True count_stereocenters bool

Whether count the number of stereocenters of molecule.

True

Returns:

Name Type Description mol_dict Tuple

Dictionary of molecule and additional metadata

num_invalid Tuple

Number of inv\u00df\u00dfalid molecules

"},{"location":"api/functional.html#auroris.curation.functional.detect_outliers","title":"detect_outliers","text":"
detect_outliers(X: np.ndarray, method: OutlierDetectionMethod = 'zscore', **kwargs: Any)\n

Functional interface for detecting outliers

Parameters:

Name Type Description Default X ndarray

The observations that we want to classify as inliers or outliers.

required method OutlierDetectionMethod

The method to use for outlier detection.

'zscore' **kwargs Any

Keyword arguments for the outlier detection method.

{}"},{"location":"api/types.html","title":"Types","text":""},{"location":"api/types.html#auroris.types","title":"auroris.types","text":""},{"location":"api/types.html#auroris.types.VerbosityLevel","title":"VerbosityLevel","text":"

Bases: IntEnum

The different verbosity levels

"},{"location":"api/utils.html","title":"Utils","text":""},{"location":"api/utils.html#auroris.utils.is_regression","title":"is_regression","text":"
is_regression(values: np.ndarray) -> bool\n

Whether the input values are for regreesion

"},{"location":"api/utils.html#auroris.utils.fig2img","title":"fig2img","text":"
fig2img(fig: Figure) -> ImageType\n

Convert a Matplotlib figure to a PIL Image

"},{"location":"api/utils.html#auroris.utils.img2bytes","title":"img2bytes","text":"
img2bytes(image: ImageType)\n

Convert png image to bytes

"},{"location":"api/utils.html#auroris.utils.bytes2img","title":"bytes2img","text":"
bytes2img(image_bytes: ByteString)\n

Convert bytes to PIL image

"},{"location":"api/utils.html#auroris.utils.save_image","title":"save_image","text":"
save_image(image: ImageType, path: str)\n

Save an image to a fsspec-compatible path

"},{"location":"api/utils.html#auroris.utils.is_parquet_file","title":"is_parquet_file","text":"
is_parquet_file(path)\n

Verify parquet file without actually loading it.

"},{"location":"api/visualization.html","title":"Visualization","text":""},{"location":"api/visualization.html#auroris.visualization.visualize_chemspace","title":"visualize_chemspace","text":"
visualize_chemspace(X: np.ndarray, y: Optional[Union[List[np.ndarray], np.ndarray]] = None, labels: Optional[List[str]] = None, n_cols: int = 2, fig_base_size: float = 8, w_h_ratio: float = 0.5, dpi: int = 150, seaborn_theme: Optional[str] = 'whitegrid', plot_kwargs: dict = None, umap_kwargs: dict = None)\n

Plot the coverage in chemical space. Also, color based on the target values.

Parameters:

Name Type Description Default X ndarray

Array the molecular features.

required y Optional[Union[List[ndarray], ndarray]]

A list of arrays with the target values.

None labels Optional[List[str]]

Optional list of labels for each set of features.

None n_cols int

Number of columns in the subplots.

2 fig_base_size float

Base size of the plots.

8 w_h_ratio float

Width/height ratio.

0.5 dpi int

DPI value of the figure.

150 seaborn_theme Optional[str]

Seaborn theme.

'whitegrid' plot_kwargs dict

seaborn plot arguments.

None umap_kwargs dict

Keyword arguments for the UMAP algorithm.

None"},{"location":"api/visualization.html#auroris.visualization.visualize_continuous_distribution","title":"visualize_continuous_distribution","text":"
visualize_continuous_distribution(data: np.ndarray, log_scale: bool = False, bins: Optional[Sequence[float]] = None)\n

KDE plot the distribution of the column in data with colored sections under the KDE curve.

Parameters:

Name Type Description Default data ndarray

A 1D numpy array with the values to plot the distribution for.

required log_scale bool

Whether to plot the x-axis in log scale.

False bins Optional[Sequence[float]]

The bin boundaries to color the area under the KDE curve.

None"},{"location":"api/visualization.html#auroris.visualization.visualize_distribution_with_outliers","title":"visualize_distribution_with_outliers","text":"
visualize_distribution_with_outliers(values: np.ndarray, is_outlier: Optional[List[bool]] = None, title: str = 'Probability Plot')\n

Visualize the distribution of the data and highlight the potential outliers.

Parameters:

Name Type Description Default values ndarray

Values for visulization.

required is_outlier Optional[List[bool]]

List of outlier flag.

None title str

Title of plot

'Probability Plot'"},{"location":"tutorials/getting_started.html","title":"Getting Started","text":"

In short

This tutorial gives an overview of the basic concepts in the `auroris` library.

On the nuances of curation

How to best curate a dataset is highly situation-dependent. The `auroris` library includes some useful tools, but blindly applying them won't necessarily lead to good datasets. To learn more, visit the Polaris Hub for extensive resources and documentation on dataset curation and more.

Data curation is concerned with analyzing and processing an existing dataset to maximize its quality. Within drug discovery, this can imply many things, such as filtering out outliers or flagging activity-cliffs. High-quality, well-curated datasets are the foundation upon which we can build realistic, impactful benchmarks for drug discovery. This notebook demonstrates how to curate your dataset with the Polaris data curation API for small molecules.

In\u00a0[3]: Copied!
import datamol as dm\n
import datamol as dm In\u00a0[4]: Copied!
# Load your data set\n# See more details of the dataset at https://docs.datamol.io/stable/api/datamol.data.html\ndata = dm.data.solubility()\ndata.head(5)\n
# Load your data set # See more details of the dataset at https://docs.datamol.io/stable/api/datamol.data.html data = dm.data.solubility() data.head(5) Out[4]: mol ID NAME SOL SOL_classification smiles split 0 <rdkit.Chem.rdchem.Mol object at 0x173b7c2e0> 1 n-pentane -3.18 (A) low CCCCC train 1 <rdkit.Chem.rdchem.Mol object at 0x173b7c430> 2 cyclopentane -2.64 (B) medium C1CCCC1 train 2 <rdkit.Chem.rdchem.Mol object at 0x173b7c4a0> 3 n-hexane -3.84 (A) low CCCCCC train 3 <rdkit.Chem.rdchem.Mol object at 0x173b7c510> 4 2-methylpentane -3.74 (A) low CCCC(C)C train 4 <rdkit.Chem.rdchem.Mol object at 0x173b7c580> 6 2,2-dimethylbutane -3.55 (A) low CCC(C)(C)C train In\u00a0[5]: Copied!
from auroris.curation import Curator\nfrom auroris.curation.actions import MoleculeCuration, OutlierDetection, Discretization\n\n# Define the curation workflow\ncurator = Curator(\n    steps=[\n        MoleculeCuration(input_column=\"smiles\"),\n        OutlierDetection(method=\"zscore\", columns=[\"SOL\"]),\n        Discretization(input_column=\"SOL\", thresholds=[-3]),\n    ],\n    parallelized_kwargs={\"n_jobs\": -1},\n)\n\n# Run the curation\ndataset, report = curator(data)\n
from auroris.curation import Curator from auroris.curation.actions import MoleculeCuration, OutlierDetection, Discretization # Define the curation workflow curator = Curator( steps=[ MoleculeCuration(input_column=\"smiles\"), OutlierDetection(method=\"zscore\", columns=[\"SOL\"]), Discretization(input_column=\"SOL\", thresholds=[-3]), ], parallelized_kwargs={\"n_jobs\": -1}, ) # Run the curation dataset, report = curator(data)
2024-08-02 12:26:54.316 | INFO     | auroris.curation._curator:transform:106 - Performing step: mol_curation\n2024-08-02 12:27:12.343 | INFO     | auroris.curation._curator:transform:106 - Performing step: outlier_detection\n2024-08-02 12:27:12.400 | INFO     | auroris.curation._curator:transform:106 - Performing step: discretize\n

The report can be exported (\"broadcaster\") to a variety of different formats. Let's simply log it to the CLI for now.

In\u00a0[6]: Copied!
from auroris.report.broadcaster import LoggerBroadcaster\n\nbroadcaster = LoggerBroadcaster(report)\nbroadcaster.broadcast()\n
from auroris.report.broadcaster import LoggerBroadcaster broadcaster = LoggerBroadcaster(report) broadcaster.broadcast()
===== Curation Report =====\nTime: 2024-08-02 12:26:54\nVersion: 0.1.4.dev0+g7127343.d20240707\n===== mol_curation =====\n[LOG]: Couldn't preprocess 18 / 1282 molecules.\n[LOG]: New column added: MOL_smiles\n[LOG]: New column added: MOL_molhash_id\n[LOG]: New column added: MOL_molhash_id_no_stereo\n[LOG]: New column added: MOL_num_stereoisomers\n[LOG]: New column added: MOL_num_undefined_stereoisomers\n[LOG]: New column added: MOL_num_defined_stereo_center\n[LOG]: New column added: MOL_num_undefined_stereo_center\n[LOG]: New column added: MOL_num_stereo_center\n[LOG]: New column added: MOL_undefined_E_D\n[LOG]: New column added: MOL_undefined_E/Z\n[LOG]: Default `ecfp` fingerprint is used to visualize the chemical space.\n[LOG]: Molecules with undefined stereocenter detected: 253.\n[IMG]: Dimensions 1200 x 600\n[IMG]: Dimensions 1200 x 2400\n===== outlier_detection =====\n[LOG]: New column added: OUTLIER_SOL\n[LOG]: Found 7 potential outliers with respect to the SOL column for review.\n[IMG]: Dimensions 1200 x 600\n===== discretize =====\n[LOG]: New column added: CLS_SOL\n[IMG]: Dimensions 1200 x 600\n===== Curation Report END =====\n

We can see that there is also images in the report! More advanced broadcasters will display these, such as the HTMLBroadcaster.

In\u00a0[7]: Copied!
from auroris.report.broadcaster import HTMLBroadcaster\nimport tempfile\n\ntemp_dir = tempfile.TemporaryDirectory().name\n\nbroadcaster = HTMLBroadcaster(report=report, destination=temp_dir, embed_images=True)\nbroadcaster.broadcast()\n
from auroris.report.broadcaster import HTMLBroadcaster import tempfile temp_dir = tempfile.TemporaryDirectory().name broadcaster = HTMLBroadcaster(report=report, destination=temp_dir, embed_images=True) broadcaster.broadcast() Out[7]:
'/var/folders/_7/ffxc1f251dbb5msn977xl4sm0000gr/T/tmps2tt3jrb/index.html'

One can review the above HTML report with embedded visualizations and share it with collaborators.

Let's also look at a single row of the new curated dataset!

In\u00a0[8]: Copied!
dataset.iloc[0]\n
dataset.iloc[0] Out[8]:
mol                                <rdkit.Chem.rdchem.Mol object at 0x173b7c2e0>\nID                                                                             1\nNAME                                                                   n-pentane\nSOL                                                                        -3.18\nSOL_classification                                                       (A) low\nsmiles                                                                     CCCCC\nsplit                                                                      train\nMOL_smiles                                                                 CCCCC\nMOL_molhash_id                          3cb2e0cf1b50d8f954891abc5dcce90d543cd3d7\nMOL_molhash_id_no_stereo                36551d628217a351e720cdbe676fca3067730a91\nMOL_num_stereoisomers                                                        1.0\nMOL_num_undefined_stereoisomers                                              1.0\nMOL_num_defined_stereo_center                                                0.0\nMOL_num_undefined_stereo_center                                              0.0\nMOL_num_stereo_center                                                        0.0\nMOL_undefined_E_D                                                          False\nMOL_undefined_E/Z                                                              0\nOUTLIER_SOL                                                                False\nCLS_SOL                                                                      0.0\nName: 0, dtype: object
In\u00a0[9]: Copied!
from auroris.curation.functional import detect_outliers\nfrom auroris.visualization import visualize_distribution_with_outliers\n\ny = dataset[\"SOL\"].values\nis_outlier = detect_outliers(y, method=\"zscore\")\nvisualize_distribution_with_outliers(y, is_outlier);\n
from auroris.curation.functional import detect_outliers from auroris.visualization import visualize_distribution_with_outliers y = dataset[\"SOL\"].values is_outlier = detect_outliers(y, method=\"zscore\") visualize_distribution_with_outliers(y, is_outlier);

Depending on the type of bioactivity and its distribution, the above plot helps to highlight data points that are potential outliers (data outside the acceptable range) or strong signals.

Reviewing these data points, and removing them if they are truely outliers, can be beneficial for QSAR modeling.

The End.

"},{"location":"tutorials/getting_started.html#curating-a-toy-dataset","title":"Curating a toy dataset\u00b6","text":"

Let's learn about the basic concepts of the auroris library by curating a toy dataset. For the sake of simplicity, we will use the solubility dataset from Datamol. It is worth noting that this dataset is only meant to be used as a toy dataset for pedagogic and testing purposes. It is not a dataset for benchmarking, analysis or model training. Curation can only take us so far. For impactful benchmarks, we rely on high-quality data sources to begin with.

"},{"location":"tutorials/getting_started.html#using-the-curator-api","title":"Using the Curator API\u00b6","text":"

The recommended way to specify curation workflows is through the Curator API:

Let's define a simple workflow with three steps:

  1. Curate the chemical structures
  2. Detect outliers
  3. Bin the regression column
"},{"location":"tutorials/getting_started.html#using-the-functional-api","title":"Using the functional API\u00b6","text":"

auroris provides a functional API to easily and quickly run some curation steps. Let's look at an oulier detection example.

"}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"index.html","title":"Introduction","text":"

Welcome to the Auroris - Simplifying Drug Discovery Data Curation

"},{"location":"index.html#what-is-auroris","title":"What is Auroris?","text":"

Auroris is a Python library designed to assist researchers and scientists in managing, cleaning, and preparing data relevant to drug discovery. Auroris will implement a range of techniques to handle, transform, filter, analyze, or visualize the diverse data types commonly encountered in drug discovery.

Currently, Auroris supports curation for small molecules, with plans to extend to other modalities in drug discovery. The curation module for small molecules includes:

Reproducibility and transparency are core to the mission of Polaris. That\u2019s why with Auroris, you can also automatically generate detailed reports summarizing the changes that happened to a dataset during curation. Through an intuitive API, you can easily define complex curation workflows. Once defined, that workflow is serializable and thus reproducible so you can transparently share how you curated the dataset.

"},{"location":"index.html#where-to-next","title":"Where to next?","text":"

Quickstart

Dive deeper into the Auroris code and learn how to curate data for your ML-powered drug discovery program.

Let's get started

API Reference

Explore the technical documentation here to delve into the inner workings of the code. Gain insights into the intricate details of how different methods and classes function.

Let's get started

Community

We're excited to have you join us in revolutionizing drug discovery data curation! Explore Auroris and the broader Polaris ecosystem it is part of, provide feedback, share your use cases, and collaborate with us to enhance and expand the capabilities of Auroris for the benefit of the drug discovery community.

Let's get started

"},{"location":"api/actions.html","title":"Actions","text":""},{"location":"api/actions.html#auroris.curation.actions.BaseAction","title":"auroris.curation.actions.BaseAction","text":"

Bases: BaseModel, ABC

An action in the curation process.

The importance of reproducibility

One of the main goals in designing auroris is to make it easy to reproduce the curation process. Reproducibility is key to scientific research. This is why a BaseAction needs to be serializable and uniquely identified by a name.

Attributes:

Name Type Description name str

The name that uniquely identifies the action. This is used to serialize and deserialize the action.

prefix str

This prefix is used when an action adds columns to a dataset. If not set, it defaults to the name in uppercase.

"},{"location":"api/actions.html#auroris.curation.actions.StereoIsomerACDetection","title":"StereoIsomerACDetection","text":"

Bases: BaseAction

Automatic detection of activity shift between stereoisomers.

See auroris.curation.functional.detect_streoisomer_activity_cliff for the docs of the stereoisomer_id_col, y_cols and threshold attributes

Attributes:

Name Type Description mol_col Optional[str]

Column with the SMILES or RDKit Molecule objects. If specified, will be used to render an image for the activity cliffs.

"},{"location":"api/actions.html#auroris.curation.actions.Deduplication","title":"Deduplication","text":"

Bases: BaseAction

Automatic detection of outliers.

See auroris.curation.functional.deduplicate for the docs of the deduplicate_on, y_cols, keep and method attributes

"},{"location":"api/actions.html#auroris.curation.actions.Discretization","title":"Discretization","text":"

Bases: BaseAction

Thresholding bioactivity columns to binary or multiclass labels.

See auroris.curation.functional.discretize for the docs of the thresholds, inplace, allow_nan and label_order attributes

Attributes:

Name Type Description input_column str

The column to discretize.

log_scale bool

Whether a visual depiction of the discretization should be on a log scale.

"},{"location":"api/actions.html#auroris.curation.actions.ContinuousDistributionVisualization","title":"ContinuousDistributionVisualization","text":"

Bases: BaseAction

Visualize one or more continuous distribution(s).

See auroris.visualization.visualize_continuous_distribution for the docs of the log_scale and bins attributes

Attributes:

Name Type Description y_cols List[str]

The columns whose distributions should be visualized.

"},{"location":"api/actions.html#auroris.curation.actions.MoleculeCuration","title":"MoleculeCuration","text":"

Bases: BaseAction

Automated molecule curation and chemistry space distribution.

See auroris.curation.functional.curate_molecules for the docs of the remove_stereo, fix_mol, count_stereoisomers, and count_stereocenters attributes

Attributes:

Name Type Description input_column str

The name of the column that has the molecules (either dm.Mol objects or SMILES).

X_col Optional[str]

Column with custom features for each of the molecules. If None, will use ECFP.

y_cols Optional[Union[str, List[str]]]

Column names for bioactivities, which will be used to colorcode the chemical space visualization.

"},{"location":"api/actions.html#auroris.curation.actions.OutlierDetection","title":"OutlierDetection","text":"

Bases: BaseAction

Automatic detection of outliers.

See auroris.curation.functional.detect_outliers for the docs of the method and kwargs attributes

Attributes:

Name Type Description columns List[str]

The columns for which to detect outliers.

"},{"location":"api/curator.html","title":"Curator","text":""},{"location":"api/curator.html#auroris.curation.Curator","title":"auroris.curation.Curator","text":"

Bases: BaseModel

A curator is a serializable collection of actions that are applied to a dataset.

Attributes:

Name Type Description steps List[BaseAction]

Ordered list of curation actions to apply to the dataset.

src_dataset_path Optional[str]

An optional path to load the source dataset from. Can be used to specify a reproducible workflow.

verbosity VerbosityLevel

Verbosity level for logging.

parallelized_kwargs dict

Keyword arguments to affect parallelization in the steps.

"},{"location":"api/curator.html#auroris.curation.Curator.transform","title":"transform","text":"
transform(dataset: Optional[pd.DataFrame] = None) -> Tuple[pd.DataFrame, CurationReport]\n

Runs the curation process.

Parameters:

Name Type Description Default dataset Optional[DataFrame]

The dataset to be curated. If src_dataset_path is set, this parameter is ignored.

None

Returns:

Type Description Tuple[DataFrame, CurationReport]

A tuple of the curated dataset and a report summarizing the changes made.

"},{"location":"api/curator.html#auroris.curation.Curator.load_dataset","title":"load_dataset staticmethod","text":"
load_dataset(path: str)\n

Loads a dataset, to be curated, from a path.

File-format support

This currently only supports CSV and Parquet files and uses the default parameters for pd.read_csv and pd.read_parquet. If you need more flexibility, consider loading the data yourself and passing it directly to Curator.transform(dataset=...).

"},{"location":"api/curator.html#auroris.curation.Curator.from_json","title":"from_json classmethod","text":"
from_json(path: str)\n

Loads a curation workflow from a JSON file.

Parameters:

Name Type Description Default path str

The path to load from

required"},{"location":"api/curator.html#auroris.curation.Curator.to_json","title":"to_json","text":"
to_json(path: str)\n

Saves the curation workflow to a JSON file.

Parameters:

Name Type Description Default path str

The destination to save to.

required"},{"location":"api/functional.html","title":"Curation","text":""},{"location":"api/functional.html#auroris.curation.functional.detect_streoisomer_activity_cliff","title":"detect_streoisomer_activity_cliff","text":"
detect_streoisomer_activity_cliff(dataset: pd.DataFrame, stereoisomer_id_col: str, y_cols: List[str], threshold: float = 2.0, prefix: str = 'AC_') -> pd.DataFrame\n

Detect activity cliff among stereoisomers based on classification label or pre-defined threshold for continuous values.

Parameters:

Name Type Description Default dataset DataFrame

Dataframe

required stereoisomer_id_col str

Column which identifies the stereoisomers

required y_cols List[str]

List of columns for bioactivities

required threshold float

Threshold to identify the activity cliff. Currently, the difference of zscores between isomers are used for identification.

2.0 prefix str

Prefix for the adding columns

'AC_'"},{"location":"api/functional.html#auroris.curation.functional.deduplicate","title":"deduplicate","text":"
deduplicate(dataset: pd.DataFrame, deduplicate_on: Optional[Union[str, List[str]]] = None, y_cols: Optional[Union[str, List[str]]] = None, keep: Literal['first', 'last'] = 'first', method: Literal['mean', 'median'] = 'median') -> pd.DataFrame\n

Deduplicate a dataframe.

If deduplicate_on specifies a subset of all columns in the dataset and y_cols specifies a set of non-overlapping columns, data will be grouped by deduplicate_on and the y_cols will be aggregated to a single value per group according to method.

Parameters:

Name Type Description Default dataset DataFrame

The dataset to deduplicate.

required deduplicate_on Optional[Union[str, List[str]]]

A subset of the columns to deduplicate on (can be default).

None y_cols Optional[Union[str, List[str]]]

The columns to aggregate.

None keep Literal['first', 'last']

Whether to keep the first or last copy of the duplicates.

'first' method Literal['mean', 'median']

The method to aggregate the data.

'median'"},{"location":"api/functional.html#auroris.curation.functional.discretize","title":"discretize","text":"
discretize(X: np.ndarray, thresholds: Union[np.ndarray, list], inplace: bool = False, allow_nan: bool = True, label_order: Literal['ascending', 'descending'] = 'ascending') -> np.ndarray\n

Thresholding of array-like or scipy.sparse matrix into binary or multiclass labels.

Parameters:

Name Type Description Default X

The data to discretize, element by element. scipy.sparse matrices should be in CSR or CSC format to avoid an un-necessary copy.

required thresholds Union[ndarray, list]

Interval boundaries that include the right bin edge.

required inplace bool

Set to True to perform inplace discretization and avoid a copy (if the input is already a numpy array or a scipy.sparse CSR / CSC matrix and if axis is 1).

False allow_nan bool

Set to True to allow nans in the array for discretization. Otherwise, an error will be raised instead.

True label_order Literal['ascending', 'descending']

The continuous values are discretized to labels 0, 1, 2, .., N with respect to given threshold bins [threshold_1, threshold_2,.., threshould_n]. When set to 'ascending', the class label is in ascending order with the threshold bins that 0 represents negative class or lower class, while 1, 2, 3 are for higher classes. When set to 'descending' the class label is in ascending order with the threshold bins. Sometimes the positive labels are on the left side of provided threshold. E.g. For binarization with threshold [0.5], the positive label is defined byX < 0.5. In this case, label_order should be descending.

'ascending'

Returns:

Name Type Description X_tr ndarray

The transformed data.

"},{"location":"api/functional.html#auroris.curation.functional.curate_molecules","title":"curate_molecules","text":"
curate_molecules(mols: List[Union[str, dm.Mol]], progress: bool = True, remove_stereo: bool = False, fix_mol: bool = True, count_stereoisomers: bool = True, count_stereocenters: bool = True, **parallelized_kwargs) -> Tuple\n

Curate a list of molecules.

Parameters:

Name Type Description Default mols List[Union[str, Mol]]

List of molecules.

required progress bool

Whether show curation progress.

True fix_mol bool

Whether fix the error in molecule.

True remove_stereo bool

Whether remove stereo chemistry information from molecule.

False count_stereoisomers bool

Whether count the number of stereoisomers of molecule.

True count_stereocenters bool

Whether count the number of stereocenters of molecule.

True

Returns:

Name Type Description mol_dict Tuple

Dictionary of molecule and additional metadata

num_invalid Tuple

Number of inv\u00df\u00dfalid molecules

"},{"location":"api/functional.html#auroris.curation.functional.detect_outliers","title":"detect_outliers","text":"
detect_outliers(X: np.ndarray, method: OutlierDetectionMethod = 'zscore', **kwargs: Any)\n

Functional interface for detecting outliers

Parameters:

Name Type Description Default X ndarray

The observations that we want to classify as inliers or outliers.

required method OutlierDetectionMethod

The method to use for outlier detection.

'zscore' **kwargs Any

Keyword arguments for the outlier detection method.

{}"},{"location":"api/types.html","title":"Types","text":""},{"location":"api/types.html#auroris.types","title":"auroris.types","text":""},{"location":"api/types.html#auroris.types.VerbosityLevel","title":"VerbosityLevel","text":"

Bases: IntEnum

The different verbosity levels

"},{"location":"api/utils.html","title":"Utils","text":""},{"location":"api/utils.html#auroris.utils.is_regression","title":"is_regression","text":"
is_regression(values: np.ndarray) -> bool\n

Whether the input values are for regreesion

"},{"location":"api/utils.html#auroris.utils.fig2img","title":"fig2img","text":"
fig2img(fig: Figure) -> ImageType\n

Convert a Matplotlib figure to a PIL Image

"},{"location":"api/utils.html#auroris.utils.img2bytes","title":"img2bytes","text":"
img2bytes(image: ImageType)\n

Convert png image to bytes

"},{"location":"api/utils.html#auroris.utils.bytes2img","title":"bytes2img","text":"
bytes2img(image_bytes: ByteString)\n

Convert bytes to PIL image

"},{"location":"api/utils.html#auroris.utils.save_image","title":"save_image","text":"
save_image(image: ImageType, path: str)\n

Save an image to a fsspec-compatible path

"},{"location":"api/utils.html#auroris.utils.is_parquet_file","title":"is_parquet_file","text":"
is_parquet_file(path)\n

Verify parquet file without actually loading it.

"},{"location":"api/visualization.html","title":"Visualization","text":""},{"location":"api/visualization.html#auroris.visualization.visualize_chemspace","title":"visualize_chemspace","text":"
visualize_chemspace(X: np.ndarray, y: Optional[Union[List[np.ndarray], np.ndarray]] = None, labels: Optional[List[str]] = None, n_cols: int = 2, fig_base_size: float = 8, w_h_ratio: float = 0.5, dpi: int = 150, seaborn_theme: Optional[str] = 'whitegrid', plot_kwargs: dict = None, umap_kwargs: dict = None)\n

Plot the coverage in chemical space. Also, color based on the target values.

Parameters:

Name Type Description Default X ndarray

Array the molecular features.

required y Optional[Union[List[ndarray], ndarray]]

A list of arrays with the target values.

None labels Optional[List[str]]

Optional list of labels for each set of features.

None n_cols int

Number of columns in the subplots.

2 fig_base_size float

Base size of the plots.

8 w_h_ratio float

Width/height ratio.

0.5 dpi int

DPI value of the figure.

150 seaborn_theme Optional[str]

Seaborn theme.

'whitegrid' plot_kwargs dict

seaborn plot arguments.

None umap_kwargs dict

Keyword arguments for the UMAP algorithm.

None"},{"location":"api/visualization.html#auroris.visualization.visualize_continuous_distribution","title":"visualize_continuous_distribution","text":"
visualize_continuous_distribution(data: np.ndarray, log_scale: bool = False, bins: Optional[Sequence[float]] = None)\n

KDE plot the distribution of the column in data with colored sections under the KDE curve.

Parameters:

Name Type Description Default data ndarray

A 1D numpy array with the values to plot the distribution for.

required log_scale bool

Whether to plot the x-axis in log scale.

False bins Optional[Sequence[float]]

The bin boundaries to color the area under the KDE curve.

None"},{"location":"api/visualization.html#auroris.visualization.visualize_distribution_with_outliers","title":"visualize_distribution_with_outliers","text":"
visualize_distribution_with_outliers(values: np.ndarray, is_outlier: Optional[List[bool]] = None, title: str = 'Probability Plot')\n

Visualize the distribution of the data and highlight the potential outliers.

Parameters:

Name Type Description Default values ndarray

Values for visulization.

required is_outlier Optional[List[bool]]

List of outlier flag.

None title str

Title of plot

'Probability Plot'"},{"location":"tutorials/getting_started.html","title":"Getting Started","text":"

In short

This tutorial gives an overview of the basic concepts in the `auroris` library.

On the nuances of curation

How to best curate a dataset is highly situation-dependent. The `auroris` library includes some useful tools, but blindly applying them won't necessarily lead to good datasets. To learn more, visit the Polaris Hub for extensive resources and documentation on dataset curation and more.

Data curation is concerned with analyzing and processing an existing dataset to maximize its quality. Within drug discovery, this can imply many things, such as filtering out outliers or flagging activity-cliffs. High-quality, well-curated datasets are the foundation upon which we can build realistic, impactful benchmarks for drug discovery. This notebook demonstrates how to curate your dataset with the Polaris data curation API for small molecules.

In\u00a0[3]: Copied!
import datamol as dm\n
import datamol as dm In\u00a0[4]: Copied!
# Load your data set\n# See more details of the dataset at https://docs.datamol.io/stable/api/datamol.data.html\ndata = dm.data.solubility()\ndata.head(5)\n
# Load your data set # See more details of the dataset at https://docs.datamol.io/stable/api/datamol.data.html data = dm.data.solubility() data.head(5) Out[4]: mol ID NAME SOL SOL_classification smiles split 0 <rdkit.Chem.rdchem.Mol object at 0x173b7c2e0> 1 n-pentane -3.18 (A) low CCCCC train 1 <rdkit.Chem.rdchem.Mol object at 0x173b7c430> 2 cyclopentane -2.64 (B) medium C1CCCC1 train 2 <rdkit.Chem.rdchem.Mol object at 0x173b7c4a0> 3 n-hexane -3.84 (A) low CCCCCC train 3 <rdkit.Chem.rdchem.Mol object at 0x173b7c510> 4 2-methylpentane -3.74 (A) low CCCC(C)C train 4 <rdkit.Chem.rdchem.Mol object at 0x173b7c580> 6 2,2-dimethylbutane -3.55 (A) low CCC(C)(C)C train In\u00a0[5]: Copied!
from auroris.curation import Curator\nfrom auroris.curation.actions import MoleculeCuration, OutlierDetection, Discretization\n\n# Define the curation workflow\ncurator = Curator(\n    steps=[\n        MoleculeCuration(input_column=\"smiles\"),\n        OutlierDetection(method=\"zscore\", columns=[\"SOL\"]),\n        Discretization(input_column=\"SOL\", thresholds=[-3]),\n    ],\n    parallelized_kwargs={\"n_jobs\": -1},\n)\n\n# Run the curation\ndataset, report = curator(data)\n
from auroris.curation import Curator from auroris.curation.actions import MoleculeCuration, OutlierDetection, Discretization # Define the curation workflow curator = Curator( steps=[ MoleculeCuration(input_column=\"smiles\"), OutlierDetection(method=\"zscore\", columns=[\"SOL\"]), Discretization(input_column=\"SOL\", thresholds=[-3]), ], parallelized_kwargs={\"n_jobs\": -1}, ) # Run the curation dataset, report = curator(data)
2024-08-02 12:26:54.316 | INFO     | auroris.curation._curator:transform:106 - Performing step: mol_curation\n2024-08-02 12:27:12.343 | INFO     | auroris.curation._curator:transform:106 - Performing step: outlier_detection\n2024-08-02 12:27:12.400 | INFO     | auroris.curation._curator:transform:106 - Performing step: discretize\n

The report can be exported (\"broadcaster\") to a variety of different formats. Let's simply log it to the CLI for now.

In\u00a0[6]: Copied!
from auroris.report.broadcaster import LoggerBroadcaster\n\nbroadcaster = LoggerBroadcaster(report)\nbroadcaster.broadcast()\n
from auroris.report.broadcaster import LoggerBroadcaster broadcaster = LoggerBroadcaster(report) broadcaster.broadcast()
===== Curation Report =====\nTime: 2024-08-02 12:26:54\nVersion: 0.1.4.dev0+g7127343.d20240707\n===== mol_curation =====\n[LOG]: Couldn't preprocess 18 / 1282 molecules.\n[LOG]: New column added: MOL_smiles\n[LOG]: New column added: MOL_molhash_id\n[LOG]: New column added: MOL_molhash_id_no_stereo\n[LOG]: New column added: MOL_num_stereoisomers\n[LOG]: New column added: MOL_num_undefined_stereoisomers\n[LOG]: New column added: MOL_num_defined_stereo_center\n[LOG]: New column added: MOL_num_undefined_stereo_center\n[LOG]: New column added: MOL_num_stereo_center\n[LOG]: New column added: MOL_undefined_E_D\n[LOG]: New column added: MOL_undefined_E/Z\n[LOG]: Default `ecfp` fingerprint is used to visualize the chemical space.\n[LOG]: Molecules with undefined stereocenter detected: 253.\n[IMG]: Dimensions 1200 x 600\n[IMG]: Dimensions 1200 x 2400\n===== outlier_detection =====\n[LOG]: New column added: OUTLIER_SOL\n[LOG]: Found 7 potential outliers with respect to the SOL column for review.\n[IMG]: Dimensions 1200 x 600\n===== discretize =====\n[LOG]: New column added: CLS_SOL\n[IMG]: Dimensions 1200 x 600\n===== Curation Report END =====\n

We can see that there is also images in the report! More advanced broadcasters will display these, such as the HTMLBroadcaster.

In\u00a0[7]: Copied!
from auroris.report.broadcaster import HTMLBroadcaster\nimport tempfile\n\ntemp_dir = tempfile.TemporaryDirectory().name\n\nbroadcaster = HTMLBroadcaster(report=report, destination=temp_dir, embed_images=True)\nbroadcaster.broadcast()\n
from auroris.report.broadcaster import HTMLBroadcaster import tempfile temp_dir = tempfile.TemporaryDirectory().name broadcaster = HTMLBroadcaster(report=report, destination=temp_dir, embed_images=True) broadcaster.broadcast() Out[7]:
'/var/folders/_7/ffxc1f251dbb5msn977xl4sm0000gr/T/tmps2tt3jrb/index.html'

One can review the above HTML report with embedded visualizations and share it with collaborators.

Let's also look at a single row of the new curated dataset!

In\u00a0[8]: Copied!
dataset.iloc[0]\n
dataset.iloc[0] Out[8]:
mol                                <rdkit.Chem.rdchem.Mol object at 0x173b7c2e0>\nID                                                                             1\nNAME                                                                   n-pentane\nSOL                                                                        -3.18\nSOL_classification                                                       (A) low\nsmiles                                                                     CCCCC\nsplit                                                                      train\nMOL_smiles                                                                 CCCCC\nMOL_molhash_id                          3cb2e0cf1b50d8f954891abc5dcce90d543cd3d7\nMOL_molhash_id_no_stereo                36551d628217a351e720cdbe676fca3067730a91\nMOL_num_stereoisomers                                                        1.0\nMOL_num_undefined_stereoisomers                                              1.0\nMOL_num_defined_stereo_center                                                0.0\nMOL_num_undefined_stereo_center                                              0.0\nMOL_num_stereo_center                                                        0.0\nMOL_undefined_E_D                                                          False\nMOL_undefined_E/Z                                                              0\nOUTLIER_SOL                                                                False\nCLS_SOL                                                                      0.0\nName: 0, dtype: object
In\u00a0[9]: Copied!
from auroris.curation.functional import detect_outliers\nfrom auroris.visualization import visualize_distribution_with_outliers\n\ny = dataset[\"SOL\"].values\nis_outlier = detect_outliers(y, method=\"zscore\")\nvisualize_distribution_with_outliers(y, is_outlier);\n
from auroris.curation.functional import detect_outliers from auroris.visualization import visualize_distribution_with_outliers y = dataset[\"SOL\"].values is_outlier = detect_outliers(y, method=\"zscore\") visualize_distribution_with_outliers(y, is_outlier);

Depending on the type of bioactivity and its distribution, the above plot helps to highlight data points that are potential outliers (data outside the acceptable range) or strong signals.

Reviewing these data points, and removing them if they are truely outliers, can be beneficial for QSAR modeling.

The End.

"},{"location":"tutorials/getting_started.html#curating-a-toy-dataset","title":"Curating a toy dataset\u00b6","text":"

Let's learn about the basic concepts of the auroris library by curating a toy dataset. For the sake of simplicity, we will use the solubility dataset from Datamol. It is worth noting that this dataset is only meant to be used as a toy dataset for pedagogic and testing purposes. It is not a dataset for benchmarking, analysis or model training. Curation can only take us so far. For impactful benchmarks, we rely on high-quality data sources to begin with.

"},{"location":"tutorials/getting_started.html#using-the-curator-api","title":"Using the Curator API\u00b6","text":"

The recommended way to specify curation workflows is through the Curator API:

Let's define a simple workflow with three steps:

  1. Curate the chemical structures
  2. Detect outliers
  3. Bin the regression column
"},{"location":"tutorials/getting_started.html#using-the-functional-api","title":"Using the functional API\u00b6","text":"

auroris provides a functional API to easily and quickly run some curation steps. Let's look at an oulier detection example.

"}]} \ No newline at end of file