Big data has an important scientific potential, specifically in the fields of data mining, machine learning and natural language processing (NLP).
This project consists first of all in collecting data and content of Arabic articles from press sites such as Al Jazeera and Hespress (Data Scraping).
After the data collection comes the stage of clustering these articles based on the category of each of them. The objective of such system is to make clusters of arabic press articles in input in text document following these categories: Politics, Culture, Sport, Tamazight and Science-Technology.
Collaborators:
- Mohamed Reda Chenna @vulca1n
- Ayoub Ezzidani @AyoubEzz99
- Anass Grini @Taylor-X01