By:
- Yme Boland
- Shane Siwpersad
- Eduard Köntöş
- Hugo Voorheijen
- Julius Bijkerk
This repository is part of a group project for the Logic and Language course at Utrecht Univerity, taught by Dr. Lasha Abzianidze PhD. We will be exploring the impact of different tokenization techniques on Natural Language Inference (NLI), using pre-trained models like BERT, RoBERTa, and other (foundational) models.
- Investigate how different tokenization strategies affect NLI model performance.
- Use the SNLI dataset (and possibly othe NLI datasets, like MNLI) for testing and comparison.
Main folder: LoLaTokenization
Different Jupyter Notebooks can be downloaded.
Work on assigned tasks in personal branches. Push updates regularly for review in weekly meetings.
This README will be updated as the project progresses to reflect more specific details and instructions.