Real-Time Recruitment System with AI and Data Analytics

Overview

In this project, we simulate job offers and CVs using Llama to generate synthetic data, which is sent to Kafka for real-time processing. Afterward, Ray is used for classification of CVs, job offers, and other data. The data is then processed by Spark to extract information in JSON format, which is saved in Delta Tables for further analysis in Databricks.

From Databricks, the data is moved to Snowflake for storage and transformation with dbt. Pinecone and Redis are utilized for data indexing and caching, respectively, while a Flask server is set up for machine learning model deployment. Additionally, the system is integrated with a chatbot based on FISS.

We also have a segmentation process, where data flows from Postgres to Spark and finally to Tableau for visualization.

Dataset Simulation

Job Offers: Leveraged web scraping on platforms like LinkedIn and other job boards to collect data on job postings. This includes job titles, companies, locations, descriptions, requirements, and more.
CV Data: Incorporated a pre-existing database of CVs containing information such as names, skills, professional experiences, educational backgrounds, and languages.
Offer Simulation: Generated simulated job offers using Llama (via the Groq API), enriching the dataset with synthetic postings tailored to various industries and locations.

Processing Pipeline

Tools & Technologies

Data Simulation:
- Llama: Used to generate synthetic job offers and CVs.
Message Queue:
- Kafka: Sends simulated job offers and CVs to the processing pipeline.
Classification:
- Ray: Classifies job offers, CVs, and other data.
Data Processing:
- Apache Spark: Processes the data, extracting it in JSON format.
- Databricks: Used for further processing and analytics.
Data Storage:
- Delta Lake: Used for efficient data storage and versioning.
- Snowflake: Cloud data warehouse for storage and querying.
Data Transformation:
- dbt: Used for data transformations in Snowflake.
Data Indexing & Caching:
- Pinecone: Data indexing and similarity search.
- Redis: Used for caching and quick access.
Machine Learning Model Deployment and User Interface:
- Flask: Provides the system's user interface where users can upload a CV and receive job offers recommendations, or upload a job offer and receive candidate recommendations. It also allows users to interact with the CVs of recommended candidates through a chatbot.
Data Segmentation:
- Postgres: Used for storing data before segmentation.
- Spark: Handles data segmentation of candidate profiles.
- Tableau: Visualizes the segmented data for insights from the data warehouse.
- Apache NiFi: Orchestrates and automates the segmentation process, ensuring smooth data flow and transformation.
Chatbot Integration:
- FAISS: A vector database that allows recruiters to interact with and discuss candidate CVs recommended by the system.
Programming Language:
- Python: The primary language used for the development and deployment of the system.

Feel free to explore and analyze the data simulation and processing pipeline to uncover valuable insights into job offers, candidate profiles, and real-time data interactions. If you have any questions or need further information, refer to the provided documentation or contact the project contributors.

Architecture

Final Result and User Interface Images

CV Upload and Recommendations Interface:
This interface allows users to upload their CV and receive a list of recommended job offers.
Job Offer Upload and Candidate Recommendations:
In this part of the interface, users can upload a job offer and receive a list of recommended candidates.
Chat with Candidate CVs:
Here, users can interact with the CVs of recommended candidates through a chatbot to discuss details.
Candidate Segmentation Dashboard:
This dashboard visualizes the segmentation of candidates for better insights into their profiles.

Dashboards Created by Analyzing Our Data Warehouse

This project includes several interactive dashboards created by analyzing data extracted from our Data Warehouse. Each dashboard presents key insights based on the collected and processed data.

Dashboard 1: Job offers distribution by city

Dashboard 2: Most Active Companies and condidates distribution by gender and years of experience

Dashboard 3: Analysis of Candidates by Competence and years of experience

Dashboard 4: Analysis of Salaries and offer count by contract type

Image:

Dashboard 5: Formation comparison between condidates and offers

Project Flow

Setup Free Azure account & Azure Keyvault - Setup
SSH into VM (kafka-vm)
- Setup Kafka Server - Setup
- Setup Spark streaming job - Setup
Setup Snowflake Warehouse - Setup
Setup Databricks Workspace & CDC (Change Data Capture) job - Setup

Error Resolution

While processing data in a Spark environment with Delta Lake, an error related to thread management was encountered. A detailed solution has been documented in the following file:
-Resolving the Error with DeltaTable and Azure Blob File System - setup

Notes and Suggestions

Integration of Ray for Stream Classification (Offer/CV or Other)

We have integrated Ray into the architecture for stream classification, aiming to differentiate between offer/CV streams and other types. However, this approach has not been fully implemented at this stage.

For a production-ready solution, if we want to connect our system to a job portal, it is essential to have a dedicated component responsible for stream classification. This component would ensure that streams are processed and classified accurately, enabling smooth interaction with external systems.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
.vscode		.vscode
Architecture		Architecture
CV_Processus		CV_Processus
Recruiter_Apps		Recruiter_Apps
SegmentationSpark		SegmentationSpark
Tableau		Tableau
apache_nifi		apache_nifi
cdc(databricks)		cdc(databricks)
code-snippets		code-snippets
data-examples		data-examples
dbt		dbt
images		images
offre_Process		offre_Process
postgres		postgres
setup		setup
snowflake		snowflake
snowflake_to_pinecone		snowflake_to_pinecone
temp		temp
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-Time Recruitment System with AI and Data Analytics

Overview

Dataset Simulation

Processing Pipeline

Tools & Technologies

Architecture

Final Result and User Interface Images

Dashboards Created by Analyzing Our Data Warehouse

Dashboard 1: Job offers distribution by city

Dashboard 2: Most Active Companies and condidates distribution by gender and years of experience

Dashboard 3: Analysis of Candidates by Competence and years of experience

Dashboard 4: Analysis of Salaries and offer count by contract type

Image:

Dashboard 5: Formation comparison between condidates and offers

Project Flow

Error Resolution

Notes and Suggestions

Integration of Ray for Stream Classification (Offer/CV or Other)

About

Releases

Packages

Contributors 4

Languages

License

Hamza88-coder/Real-Time-Recruitment-System-with-AI-and-Data-Analytics

Folders and files

Latest commit

History

Repository files navigation

Real-Time Recruitment System with AI and Data Analytics

Overview

Dataset Simulation

Processing Pipeline

Tools & Technologies

Architecture

Final Result and User Interface Images

Dashboards Created by Analyzing Our Data Warehouse

Dashboard 1: Job offers distribution by city

Dashboard 2: Most Active Companies and condidates distribution by gender and years of experience

Dashboard 3: Analysis of Candidates by Competence and years of experience

Dashboard 4: Analysis of Salaries and offer count by contract type

Image:

Dashboard 5: Formation comparison between condidates and offers

Project Flow

Error Resolution

Notes and Suggestions

Integration of Ray for Stream Classification (Offer/CV or Other)

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages