Release 0.05 (Alpha) · DavidOsipov/Keywords4CV

Changelog

Version 0.05 (Alpha) - 20/02/2025

This release includes significant improvements to the keyword extraction and analysis pipeline, focusing on enhanced memory management, improved configuration handling, and more robust error handling. However, please note that Known Issue #1 and #2 from the previous version persist in this release and have not yet been resolved.

Major Changes:

Enhanced Memory Management:
- Implemented chunking of job descriptions to process large datasets without exceeding memory limits. The chunk size is dynamically calculated based on available memory.
- Added memory usage checks to proactively clear caches when memory usage is high.
- Introduced options for managing and configuring memory usage to prevent out-of-memory errors
Improved Configuration:
- Added section extraction capabilities: Extracts sections from job descriptions based on section headings (e.g., "Responsibilities," "Requirements").
- Introduced fallback to a basic spaCy model with a sentencizer if the specified model cannot be loaded or downloaded.
- Enhanced validation of input job descriptions, including length checks and invalid character removal.
- Added a text_encoding configuration option to handle different character encodings gracefully.
More Robust Error Handling:
- Implemented a retry mechanism for job analysis to handle transient errors.
- Added strict mode to control whether exceptions are raised or gracefully handled.
- Improved error handling during spaCy model loading and downloading.
Keyword Extraction Improvements:
- Enhanced keyword extraction with semantic filtering based on context to improve accuracy.

New Features:

Dynamic Chunking: The system now automatically chunks large job description sets into smaller, manageable pieces.
Fallback spaCy Model: Added a fallback mechanism that loads a simpler spaCy model if the configured model fails to load.
Configuration Option Added text_encoding option to handle non-UTF-8 jobs descriptions encodings.

Code Quality:

Improved code structure and documentation.
Added a comprehensive test suite to ensure code reliability.
Added type hints, making the code easier to read and maintain.

Known Issues:

[Critical, Unresolved] The final output in excel contains incorrectly displayed keywords.
[Critical, Unresolved] The unittest isn't throughtly tested and contains various mistakes - it fails every time.

Dependencies:

nltk
numpy
pandas
spacy
scikit-learn
pyyaml
psutil
hashlib

Future Improvements:

[High Priority] Fix the incorrectly displayed keywords outputted in excel.
[High Priority] Fix the unittest to test all important parts, including the above, that are critical.
Further optimization of keyword extraction and scoring.
Improved handling of rare or domain-specific keywords.
Enhanced user interface.
Integration with other ATS systems.
More sophisticated synonym generation techniques.

Full Changelog: 0.01...0.05

What's Changed

Configure Renovate by @renovate in #1
Update actions/setup-python action to v5 by @renovate in #2
Added permissions by @DavidOsipov in #5
Enhance ATS keyword extraction and analysis by @DavidOsipov in #9
Update dependency scikit-learn to v1.5.0 [SECURITY] by @renovate in #10
Update dependency spacy to v3.8.4 by @renovate in #11

New Contributors

@renovate made their first contribution in #1
@DavidOsipov made their first contribution in #5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.05 (Alpha)

Changelog

Version 0.05 (Alpha) - 20/02/2025

What's Changed

New Contributors

Contributors