Skip to content

API for feature clustering, generating hierarchical feature organization with feature family clustering.

Notifications You must be signed in to change notification settings

nlp4se/FeaClustRE

Repository files navigation

FeaClustRE – A Feature Clustering and Analysis Visualization Tool


Introduction

FeaClustRE (Feature Clustering and Analysis Visualization Tool) is an advanced tool designed to analyze, cluster, and visualize structured hierarchical features using NLP and LLM models and techniques. It provides hierarchical clustering, dendrogram visualizations, and evaluations to help to explore complex lists of features.

This tool uses Meta's LLaMA model for feature embedding and Hugging Face's Transformers for feature family clustering.

With a flexible backend API, a CLI client, and visualization tools, FeaClustRE supports both interactive analysis and automated batch processing.

This tool is part of the RE-Miner Ecosystem, which can be explored in the GESSI-NLP4SE repository.

Key Features

  • Custom Clustering Algorithm – Uses a hand-made affinity-based clustering approach to automatically group similar features.
  • Dendrogram Visualization – Generates hierarchical visualizations to explore feature relationships.
  • Preprocessing Pipelines – Provides data cleaning and transformation utilities.
  • API and CLI Support – Run analysis through API endpoints or via local CLI commands.
  • Hugging Face Model Integration – Supports Meta LLaMA for embedding-based clustering (requires access).
  • Docker Support – Easily deployable using Docker and Docker Compose.

📌 Table of Contents


🎥 Demo & Screenshots

(Coming Soon)


🔑 Hugging Face Token Authentication & LLaMA Access

This project uses Meta's LLaMA model, which is gated and requires manual approval from Hugging Face.

How to Get Access to LLaMA

  1. Visit the LLaMA Model 3.2-3B Page.
  2. Click Request Access and follow the instructions.
  3. Wait for Hugging Face to approve your request.

Using Your Hugging Face Token

To authenticate, you must set your Hugging Face token before running the project.

Set the Token in .env

In the .env file in the project root, add:

HUGGING_FACE_HUB_TOKEN=your_huggingface_token

🛠 Installation

Local Installation

  1. Before using, install the required spaCy model:
python -m spacy download en_core_web_sm
  1. Set your HUGGING_FACE_HUB_TOKEN in the .env file
HUGGING_FACE_HUB_TOKEN=${HUGGINGFACE_TOKEN}
  1. Install dependencies
pipenv install
  1. Execute API
flask run --port=3008

Docker Installation

  1. Build and run the Docker Image
docker build -t release . && docker run -p 3008:3008 --name feaclustre release 

📂 Project Structure

The following is the structure of the FeaClustRE project:

FeaClustRE/
│── .github/                  # GitHub Actions & CI/CD workflows
│── backend/                   # Backend services and clustering algorithms
│   │── data-preprocessing/     # Scripts for processing raw data
│   │── Affinity_strategy.py    # Strategy for affinity clustering
│   │── Context.py              # Context manager for clustering
│   │── dendogram_controller.py # Handles dendrogram API calls
│   │── dendogram_service.py    # Service for generating dendrograms
│   │── graph_controller.py     # Graph visualization API
│   │── graph_service.py        # Graph computation logic
│   │── preprocessing_service.py # Handles feature preprocessing
│   │── tf_idf_utils.py         # Utilities for TF-IDF calculations
│   │── utils.py                # General utility functions
│   │── visualization_service.py # Generates visualizations for clusters
│── cli-client/                 # Command-line interface for clustering
│   │── scripts/                # Helper scripts
│   │── dendogram_generation.py # CLI tool for dendrogram generation
│   │── dynamic_visualizator.py # CLI tool for dynamic visualization
│   │── requester.py            # Request handler for API calls
│   │── visualizator.py         # CLI tool for visualization
│── data/                       # Data storage directory
│── .env                        # Environment variables (ignored in Git)
│── .gitattributes              # Git attributes
│── .gitignore                  # Git ignore file
│── docker-compose.yml          # Docker Compose configuration
│── Dockerfile                  # Docker build configuration
│── Pipfile                     # Pipenv dependencies
│── Pipfile.lock                 # Locked dependencies
│── README.md                    # Project documentation
│── wsgi.py                      # Entry point for the Flask application

About

API for feature clustering, generating hierarchical feature organization with feature family clustering.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published