Skip to content

VoxDroid/Synthetic-Dataset-Generator-DataLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🎉 Dataset Generator Notebook 🎉

GitHub repo size GitHub stars GitHub forks License Python version Jupyter Notebook


📚 Description

Synthetic Dataset Generator is a versatile Jupyter Notebook designed to create synthetic datasets for a wide range of applications. Whether you need data for machine learning, statistical analysis, or testing purposes, this tool allows you to generate customizable datasets quickly and efficiently. It utilizes the DataLLM API to provide diverse query-response pairs, making it suitable for various fields and projects.

🚀 Features

  • Generate synthetic data using the DataLLM API.
  • User-friendly prompts for customized queries and responses.
  • Progress tracking during dataset creation with TQDM.
  • Dynamic CSV output with options for timestamped filenames.

📦 Requirements

To run this notebook, ensure you have the following dependencies installed:

pip install pandas tqdm datallm

📖 Usage

  1. Clone this repository:
    git clone https://github.com/VoxDroid/Synthetic-Dataset-Generator-DataLLM.git
  2. Navigate to the project directory:
    cd Synthetic-Dataset-Generator-DataLLM
  3. Open the Jupyter Notebook:
    jupyter notebook Synthetic_Dataset_Generator_DataLLM.ipynb
  4. Follow the instructions in the notebook to generate your dataset.

🛠️ Example of Dataset Structure

ID Input Query Output Response
1 What are the best programming languages? It depends on your goals, but Python is widely used!
2 How do I start learning machine learning? Begin with Python and familiarize yourself with libraries like TensorFlow.

🎨 Visual Representation

Dataset Sample

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

📞 Contact

For any inquiries, feel free to reach out to: