This is the repository for the paper “PIPES: A Meta-dataset of Machine Learning Pipelines”, submitted to IJCNN 2025
This is an API built with FastAPI that allows you to retrieve PIPES metadata. The API provides endpoints to fetch datasets and retrieve algorithm-specific data.
-
Retrieves data from files based on the dataset ID and algorithm name.
-
Automatically handles missing and infinite values in the dataset.
-
Returns clean JSON responses.
##Requirements
- Python 3.8 or higher
- FastAPI
- Pandas
- NumPy
- Uvicorn (to run the server)
- Clone the repository:
git clone <repository-url>
cd <repository-folder>
- Create a virtual environment and activate it:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Set up the
datasets
directory: Make sure a folder nameddatasets
exists in the root directory. Within this folder, organize subfolders by dataset ID, each containing CSV files.
Example structure:
datasets/
1/
algorithm1_data.csv
algorithm2_data.csv
2/
algorithm1_data.csv
- Run the API server:
uvicorn main:app --reload
- Access the API documentation:
- Swagger UI: http://127.0.0.1:8000/docs
- URL SWAGGER UI: https://pipes-production.up.railway.app/docs
- URL:
/data/{dataset_id}/{algorithm_name}
- Method:
GET
- Description: Fetches data from a CSV file that matches the given dataset ID and algorithm name.
Parameter | Type | Description |
---|---|---|
dataset_id |
int |
The ID of the dataset folder. |
algorithm_name |
str |
The name of the algorithm to search for. |
| Status code | Description | Example |
| ----------- | ----------------------------- | -------------------------------------------------- ------ |
| 200
| Success, returns the dataset. | [ {"column1": "value1", "column2": "value2"} ]
|
| 404
| Dataset or file not found. | { "detail": "Dataset ID 1 not found." }
|
| 500
| Error processing file. | { "detail": "Error reading file: file is corrupt." }
|
GET /data/1/algorithm1
[
{
"column1": "value1",
"column2": "value2"
},
{
"column1": "value3",
"column2": "value4"
}
]
Add POST endpoint to send data