- Python 3.8 or above
- pip (Python package manager)
- pandas
- scikit-learn
-
The system reads mentor data from a CSV file containing fields such as:
- subjects
- college
- languages_spoken
- mentorship_style
- available_times
- mode_of_mentorship
- years_of_experience
-
A custom profile string is generated for each mentor by concatenating key fields.
-
Weighting is applied to prioritize more relevant attributes:
subjects
× 3college
× 2languages_spoken
× 2- Remaining fields × 1
-
The user (aspirant) provides preferences via command-line input for:
- preferred subjects
- target college
- preparation level
- preferred learning style
- languages spoken
-
A profile string is generated from these responses.
-
Note: No custom weights are currently applied to aspirant input fields.
- All mentor profiles and the aspirant profile are converted into TF-IDF vectors.
- This creates a document-term matrix where each term's importance is determined based on frequency and uniqueness.
- Cosine similarity is computed between the aspirant profile vector and each mentor vector.
- This measures how closely the aspirant's interests align with mentor profiles.
- Mentors are ranked based on similarity scores.
- The top 3 mentors with the highest cosine similarity values are selected and recommended to the user.
- No collaborative filtering — system does not learn from other users' behavior.
- No semantic understanding — relies only on keyword matching; does not use BERT or sentence embeddings.
- No handling of:
- Synonyms (e.g., "legal GK" vs "legal general knowledge")
- Spelling mistakes or typos
- No personalization loop — lacks feedback-based learning or user rating system.
- Fixed weights — all mentors are evaluated with static profile generation logic.