Arabic-News-Categories-Clustering

Big data has an important scientific potential, specifically in the fields of data mining, machine learning and natural language processing (NLP).

This project consists first of all in collecting data and content of Arabic articles from press sites such as Al Jazeera and Hespress (Data Scraping).

After the data collection comes the stage of clustering these articles based on the category of each of them. The objective of such system is to make clusters of arabic press articles in input in text document following these categories: Politics, Culture, Sport, Tamazight and Science-Technology.

Collaborators:

Mohamed Reda Chenna @vulca1n
Ayoub Ezzidani @AyoubEzz99
Anass Grini @Taylor-X01

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
raw_data		raw_data
scrapper		scrapper
.gitignore		.gitignore
Final notebook NLP -Chenna Grini Ezzidani.ipynb		Final notebook NLP -Chenna Grini Ezzidani.ipynb
LICENSE		LICENSE
README.md		README.md
Unsupervised Learning Project - Presentation.pdf		Unsupervised Learning Project - Presentation.pdf
Unsupervised_Learning Project - Rapport.pdf		Unsupervised_Learning Project - Rapport.pdf
arabic_stopwords.txt		arabic_stopwords.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arabic-News-Categories-Clustering

About

Contributors 2

Languages

License

Taylor-X01/News-Categories-Clustering

Folders and files

Latest commit

History

Repository files navigation

Arabic-News-Categories-Clustering

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages