FeaClustRE (Feature Clustering and Analysis Visualization Tool) is an advanced tool designed to analyze, cluster, and visualize structured hierarchical features using NLP and LLM models and techniques. It provides hierarchical clustering, dendrogram visualizations, and evaluations to help to explore complex lists of features.
This tool uses Meta's LLaMA model for feature embedding and Hugging Face's Transformers for feature family clustering.
With a flexible backend API, a CLI client, and visualization tools, FeaClustRE supports both interactive analysis and automated batch processing.
This tool is part of the RE-Miner Ecosystem, which can be explored in the GESSI-NLP4SE repository.
- Custom Clustering Algorithm – Uses a hand-made affinity-based clustering approach to automatically group similar features.
- Dendrogram Visualization – Generates hierarchical visualizations to explore feature relationships.
- Preprocessing Pipelines – Provides data cleaning and transformation utilities.
- API and CLI Support – Run analysis through API endpoints or via local CLI commands.
- Hugging Face Model Integration – Supports Meta LLaMA for embedding-based clustering (requires access).
- Docker Support – Easily deployable using Docker and Docker Compose.
- Demo & Screenshots
- Hugging Face Token Authentication & LLaMA Access
- Installation
- Project Structure
- Running Preprocessing Scripts
(Coming Soon)
This project uses Meta's LLaMA model, which is gated and requires manual approval from Hugging Face.
- Visit the LLaMA Model 3.2-3B Page.
- Click Request Access and follow the instructions.
- Wait for Hugging Face to approve your request.
To authenticate, you must set your Hugging Face token before running the project.
In the .env
file in the project root, add:
HUGGING_FACE_HUB_TOKEN=your_huggingface_token
- Before using, install the required spaCy model:
python -m spacy download en_core_web_sm
- Set your
HUGGING_FACE_HUB_TOKEN
in the .env file
HUGGING_FACE_HUB_TOKEN=${HUGGINGFACE_TOKEN}
- Install dependencies
pipenv install
- Execute API
flask run --port=3008
- Build and run the Docker Image
docker build -t release . && docker run -p 3008:3008 --name feaclustre release
The following is the structure of the FeaClustRE project:
FeaClustRE/
│── .github/ # GitHub Actions & CI/CD workflows
│── backend/ # Backend services and clustering algorithms
│ │── data-preprocessing/ # Scripts for processing raw data
│ │── Affinity_strategy.py # Strategy for affinity clustering
│ │── Context.py # Context manager for clustering
│ │── dendogram_controller.py # Handles dendrogram API calls
│ │── dendogram_service.py # Service for generating dendrograms
│ │── graph_controller.py # Graph visualization API
│ │── graph_service.py # Graph computation logic
│ │── preprocessing_service.py # Handles feature preprocessing
│ │── tf_idf_utils.py # Utilities for TF-IDF calculations
│ │── utils.py # General utility functions
│ │── visualization_service.py # Generates visualizations for clusters
│── cli-client/ # Command-line interface for clustering
│ │── scripts/ # Helper scripts
│ │── dendogram_generation.py # CLI tool for dendrogram generation
│ │── dynamic_visualizator.py # CLI tool for dynamic visualization
│ │── requester.py # Request handler for API calls
│ │── visualizator.py # CLI tool for visualization
│── data/ # Data storage directory
│── .env # Environment variables (ignored in Git)
│── .gitattributes # Git attributes
│── .gitignore # Git ignore file
│── docker-compose.yml # Docker Compose configuration
│── Dockerfile # Docker build configuration
│── Pipfile # Pipenv dependencies
│── Pipfile.lock # Locked dependencies
│── README.md # Project documentation
│── wsgi.py # Entry point for the Flask application