Skip to content

0.05 (Alpha)

Pre-release
Pre-release
Compare
Choose a tag to compare
@DavidOsipov DavidOsipov released this 20 Feb 16:01
· 28 commits to main since this release
cfa5d23

Changelog

Version 0.05 (Alpha) - 20/02/2025

This release includes significant improvements to the keyword extraction and analysis pipeline, focusing on enhanced memory management, improved configuration handling, and more robust error handling. However, please note that Known Issue #1 and #2 from the previous version persist in this release and have not yet been resolved.

Major Changes:

  • Enhanced Memory Management:

    • Implemented chunking of job descriptions to process large datasets without exceeding memory limits. The chunk size is dynamically calculated based on available memory.
    • Added memory usage checks to proactively clear caches when memory usage is high.
    • Introduced options for managing and configuring memory usage to prevent out-of-memory errors
  • Improved Configuration:

    • Added section extraction capabilities: Extracts sections from job descriptions based on section headings (e.g., "Responsibilities," "Requirements").
    • Introduced fallback to a basic spaCy model with a sentencizer if the specified model cannot be loaded or downloaded.
    • Enhanced validation of input job descriptions, including length checks and invalid character removal.
    • Added a text_encoding configuration option to handle different character encodings gracefully.
  • More Robust Error Handling:

    • Implemented a retry mechanism for job analysis to handle transient errors.
    • Added strict mode to control whether exceptions are raised or gracefully handled.
    • Improved error handling during spaCy model loading and downloading.
  • Keyword Extraction Improvements:

    • Enhanced keyword extraction with semantic filtering based on context to improve accuracy.

New Features:

  • Dynamic Chunking: The system now automatically chunks large job description sets into smaller, manageable pieces.
  • Fallback spaCy Model: Added a fallback mechanism that loads a simpler spaCy model if the configured model fails to load.
  • Configuration Option Added text_encoding option to handle non-UTF-8 jobs descriptions encodings.

Code Quality:

  • Improved code structure and documentation.
  • Added a comprehensive test suite to ensure code reliability.
  • Added type hints, making the code easier to read and maintain.

Known Issues:

  1. [Critical, Unresolved] The final output in excel contains incorrectly displayed keywords.

  2. [Critical, Unresolved] The unittest isn't throughtly tested and contains various mistakes - it fails every time.

Dependencies:

  • nltk
  • numpy
  • pandas
  • spacy
  • scikit-learn
  • pyyaml
  • psutil
  • hashlib

Future Improvements:

  • [High Priority] Fix the incorrectly displayed keywords outputted in excel.
  • [High Priority] Fix the unittest to test all important parts, including the above, that are critical.
  • Further optimization of keyword extraction and scoring.
  • Improved handling of rare or domain-specific keywords.
  • Enhanced user interface.
  • Integration with other ATS systems.
  • More sophisticated synonym generation techniques.

Full Changelog: 0.01...0.05

What's Changed

New Contributors