Welcome to the Detection of Large-Language Model (LLM) Generated Text project! This project aims to distinguish between human-generated and LLM-generated text, addressing concerns related to misinformation, plagiarism, and ethics. With the rise of LLM models like GPT-4, it has become crucial to develop robust detection mechanisms to ensure the integrity and reliability of text-based communication.
The need for detecting LLM-generated text arises from several factors:
- Misinformation Control: 🚫 LLMs can spread false information rapidly. Detection helps in identifying and mitigating the impact of misinformation.
- Plagiarism Prevention: 📝 Identifying LLM-generated content assists in preventing academic and content plagiarism, maintaining integrity in research and publications.
- Ethical Considerations: ⚖️ Understanding the origin of text content is crucial for maintaining ethical standards, especially in sensitive areas like news reporting and legal documentation.
- Trust and Transparency: 🔍 Detection fosters trust by ensuring transparency about the source of text content, enhancing credibility in communication channels.
In this project, we leverage natural language processing (NLP) techniques to build an effective detection system. The process includes:
- Data Collection: Gathering a diverse dataset of human and LLM-generated texts.
- Feature Engineering: Extracting relevant features that can distinguish between human and LLM-generated texts.
- Model Selection: Choosing appropriate machine learning models for classification.
- Evaluation: Assessing model performance using metrics like accuracy, precision, recall, and F1-score.
The project has been successfully deployed! Check it out here:
Explore the detailed project notebook on Kaggle:
Dive into the full story behind this project on Medium:
Follow these steps to set up and run the project locally:
-
Clone the repository:
git clone -b Deployment https://github.com/Harshithvarma007/LLM_Text_Detection.git cd LLM_Text_Detection
-
Set up virtual environment and install dependencies:
python3 -m venv venv source venv/bin/activate pip install -r requirements.txt
-
Run the Streamlit app:
streamlit run app.py
Follow the below steps to train the model Locally:
Certainly! Here are the steps formatted in a more readable and presentable way:
To train the model locally, follow these steps:
git clone -b main https://github.com/Harshithvarma007/LLM_Text_Detection.git
cd LLM_Text_Detection
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
This script will train the detection model using the configured dataset and save the trained model weights.
Start the Streamlit app to interact with the trained model:
streamlit run app.py
Once the Streamlit app is running, open your web browser and visit http://localhost:8501. Navigate within the app to find the "Train Model" button to initiate training if it's not already trained using train.py
.
We welcome contributions! Please feel free to submit issues, fork the repository, and send pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.
Feel free to reach out for any queries or collaboration opportunities!
Harshith Varma
Thank you for checking out this project! 🙌 We hope you find it useful and informative. Happy coding! 💻🎉