Skip to content

Harshithvarma007/LLM_Text_Detection

Repository files navigation

Detection of Large-Language Model (LLM) Generated Text 📝🤖

🌟 Introduction

Welcome to the Detection of Large-Language Model (LLM) Generated Text project! This project aims to distinguish between human-generated and LLM-generated text, addressing concerns related to misinformation, plagiarism, and ethics. With the rise of LLM models like GPT-4, it has become crucial to develop robust detection mechanisms to ensure the integrity and reliability of text-based communication.

📚 Project Overview

🔍 Need for Detection of LLM Generated Text

The need for detecting LLM-generated text arises from several factors:

  • Misinformation Control: 🚫 LLMs can spread false information rapidly. Detection helps in identifying and mitigating the impact of misinformation.
  • Plagiarism Prevention: 📝 Identifying LLM-generated content assists in preventing academic and content plagiarism, maintaining integrity in research and publications.
  • Ethical Considerations: ⚖️ Understanding the origin of text content is crucial for maintaining ethical standards, especially in sensitive areas like news reporting and legal documentation.
  • Trust and Transparency: 🔍 Detection fosters trust by ensuring transparency about the source of text content, enhancing credibility in communication channels.

⚙️ Methodology

In this project, we leverage natural language processing (NLP) techniques to build an effective detection system. The process includes:

  1. Data Collection: Gathering a diverse dataset of human and LLM-generated texts.
  2. Feature Engineering: Extracting relevant features that can distinguish between human and LLM-generated texts.
  3. Model Selection: Choosing appropriate machine learning models for classification.
  4. Evaluation: Assessing model performance using metrics like accuracy, precision, recall, and F1-score.

🚀 Project Deployment

The project has been successfully deployed! Check it out here: Deployed Project

📊 Project Notebook

Explore the detailed project notebook on Kaggle: Kaggle

📖 Read the Blog

Dive into the full story behind this project on Medium: Medium

🛠️ Setup and Installation

Follow these steps to set up and run the project locally:

  1. Clone the repository:

    git clone -b Deployment https://github.com/Harshithvarma007/LLM_Text_Detection.git
    cd LLM_Text_Detection
  2. Set up virtual environment and install dependencies:

    python3 -m venv venv
    source venv/bin/activate
    pip install -r requirements.txt
  3. Run the Streamlit app:

    streamlit run app.py

Follow the below steps to train the model Locally:

Certainly! Here are the steps formatted in a more readable and presentable way:


🚀 Training the Model Locally

To train the model locally, follow these steps:

Step 1: Clone the Repository

git clone -b main https://github.com/Harshithvarma007/LLM_Text_Detection.git
cd LLM_Text_Detection

Step 2: Set Up Virtual Environment and Install Dependencies

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

This script will train the detection model using the configured dataset and save the trained model weights.

Step 3: Run the Application

Start the Streamlit app to interact with the trained model:

streamlit run app.py

Step 4: Navigate the App

Once the Streamlit app is running, open your web browser and visit http://localhost:8501. Navigate within the app to find the "Train Model" button to initiate training if it's not already trained using train.py.

🤝 Contributing

We welcome contributions! Please feel free to submit issues, fork the repository, and send pull requests.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📬 Contact

Feel free to reach out for any queries or collaboration opportunities!

Harshith Varma


Thank you for checking out this project! 🙌 We hope you find it useful and informative. Happy coding! 💻🎉

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published