Skip to content

Simulation of job offers and CVs with real-time processing, classification, and analytics using Kafka, Ray, Spark, and Databricks. Includes a Flask-based recommendation system and Tableau visualizations.

License

Notifications You must be signed in to change notification settings

Hamza88-coder/Real-Time-Recruitment-System-with-AI-and-Data-Analytics

Repository files navigation

Real-Time Recruitment System with AI and Data Analytics

Overview

In this project, we simulate job offers and CVs using Llama to generate synthetic data, which is sent to Kafka for real-time processing. Afterward, Ray is used for classification of CVs, job offers, and other data. The data is then processed by Spark to extract information in JSON format, which is saved in Delta Tables for further analysis in Databricks.

From Databricks, the data is moved to Snowflake for storage and transformation with dbt. Pinecone and Redis are utilized for data indexing and caching, respectively, while a Flask server is set up for machine learning model deployment. Additionally, the system is integrated with a chatbot based on FISS.

We also have a segmentation process, where data flows from Postgres to Spark and finally to Tableau for visualization.

Dataset Simulation

  • Job Offers: Leveraged web scraping on platforms like LinkedIn and other job boards to collect data on job postings. This includes job titles, companies, locations, descriptions, requirements, and more.

  • CV Data: Incorporated a pre-existing database of CVs containing information such as names, skills, professional experiences, educational backgrounds, and languages.

  • Offer Simulation: Generated simulated job offers using Llama (via the Groq API), enriching the dataset with synthetic postings tailored to various industries and locations.

Processing Pipeline

Tools & Technologies

  • Data Simulation:

    • Llama: Used to generate synthetic job offers and CVs.
  • Message Queue:

    • Kafka: Sends simulated job offers and CVs to the processing pipeline.
  • Classification:

    • Ray: Classifies job offers, CVs, and other data.
  • Data Processing:

    • Apache Spark: Processes the data, extracting it in JSON format.
    • Databricks: Used for further processing and analytics.
  • Data Storage:

    • Delta Lake: Used for efficient data storage and versioning.
    • Snowflake: Cloud data warehouse for storage and querying.
  • Data Transformation:

    • dbt: Used for data transformations in Snowflake.
  • Data Indexing & Caching:

    • Pinecone: Data indexing and similarity search.
    • Redis: Used for caching and quick access.
  • Machine Learning Model Deployment and User Interface:

    • Flask: Provides the system's user interface where users can upload a CV and receive job offers recommendations, or upload a job offer and receive candidate recommendations. It also allows users to interact with the CVs of recommended candidates through a chatbot.
  • Data Segmentation:

    • Postgres: Used for storing data before segmentation.
    • Spark: Handles data segmentation of candidate profiles.
    • Tableau: Visualizes the segmented data for insights from the data warehouse.
    • Apache NiFi: Orchestrates and automates the segmentation process, ensuring smooth data flow and transformation.
  • Chatbot Integration:

    • FAISS: A vector database that allows recruiters to interact with and discuss candidate CVs recommended by the system.
  • Programming Language:

    • Python: The primary language used for the development and deployment of the system.

Feel free to explore and analyze the data simulation and processing pipeline to uncover valuable insights into job offers, candidate profiles, and real-time data interactions. If you have any questions or need further information, refer to the provided documentation or contact the project contributors.

Architecture

Final Result and User Interface Images

  1. CV Upload and Recommendations Interface:
    This interface allows users to upload their CV and receive a list of recommended job offers.

    CV Upload and Recommendations CV Upload and Recommendations CV Upload and Recommendations CV Upload and Recommendations

  2. Job Offer Upload and Candidate Recommendations:
    In this part of the interface, users can upload a job offer and receive a list of recommended candidates.

    Job Offer Upload and Candidate Recommendations Job Offer Upload and Candidate Recommendations

  3. Chat with Candidate CVs:
    Here, users can interact with the CVs of recommended candidates through a chatbot to discuss details.

    Chat with Candidate CVs Chat with Candidate CVs Chat with Candidate CVs

  4. Candidate Segmentation Dashboard:
    This dashboard visualizes the segmentation of candidates for better insights into their profiles.

    Candidate Segmentation Dashboard Candidate Segmentation Dashboard

Dashboards Created by Analyzing Our Data Warehouse

This project includes several interactive dashboards created by analyzing data extracted from our Data Warehouse. Each dashboard presents key insights based on the collected and processed data.

Dashboard 1: Job offers distribution by city

Dashboard 1 Image

Dashboard 2: Most Active Companies and condidates distribution by gender and years of experience

Dashboard 2 Image

Dashboard 3: Analysis of Candidates by Competence and years of experience

Dashboard 3 Image

Dashboard 4: Analysis of Salaries and offer count by contract type

Image:

Dashboard 3 Image

Dashboard 5: Formation comparison between condidates and offers

Dashboard 3 Image

Project Flow

  • Setup Free Azure account & Azure Keyvault - Setup
  • SSH into VM (kafka-vm)
    • Setup Kafka Server - Setup

    • Setup Spark streaming job - Setup

  • Setup Snowflake Warehouse - Setup
  • Setup Databricks Workspace & CDC (Change Data Capture) job - Setup

Error Resolution

While processing data in a Spark environment with Delta Lake, an error related to thread management was encountered. A detailed solution has been documented in the following file:
-Resolving the Error with DeltaTable and Azure Blob File System - setup

Notes and Suggestions

Integration of Ray for Stream Classification (Offer/CV or Other)

We have integrated Ray into the architecture for stream classification, aiming to differentiate between offer/CV streams and other types. However, this approach has not been fully implemented at this stage.

For a production-ready solution, if we want to connect our system to a job portal, it is essential to have a dedicated component responsible for stream classification. This component would ensure that streams are processed and classified accurately, enabling smooth interaction with external systems.

About

Simulation of job offers and CVs with real-time processing, classification, and analytics using Kafka, Ray, Spark, and Databricks. Includes a Flask-based recommendation system and Tableau visualizations.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •