Synthetic Dataset Generator is a versatile Jupyter Notebook designed to create synthetic datasets for a wide range of applications. Whether you need data for machine learning, statistical analysis, or testing purposes, this tool allows you to generate customizable datasets quickly and efficiently. It utilizes the DataLLM API to provide diverse query-response pairs, making it suitable for various fields and projects.
- Generate synthetic data using the DataLLM API.
- User-friendly prompts for customized queries and responses.
- Progress tracking during dataset creation with TQDM.
- Dynamic CSV output with options for timestamped filenames.
To run this notebook, ensure you have the following dependencies installed:
pip install pandas tqdm datallm
- Clone this repository:
git clone https://github.com/VoxDroid/Synthetic-Dataset-Generator-DataLLM.git
- Navigate to the project directory:
cd Synthetic-Dataset-Generator-DataLLM
- Open the Jupyter Notebook:
jupyter notebook Synthetic_Dataset_Generator_DataLLM.ipynb
- Follow the instructions in the notebook to generate your dataset.
ID | Input Query | Output Response |
---|---|---|
1 | What are the best programming languages? | It depends on your goals, but Python is widely used! |
2 | How do I start learning machine learning? | Begin with Python and familiarize yourself with libraries like TensorFlow. |
This project is licensed under the MIT License - see the LICENSE file for details.
For any inquiries, feel free to reach out to:
- GitHub Profile - @VoxDroid