Skip to content

Latest commit

 

History

History
29 lines (15 loc) · 1.55 KB

README.md

File metadata and controls

29 lines (15 loc) · 1.55 KB

MicrosoftEssexLDA

This is a project executed between LAMFO (University of Brasilía) and UoE (University of Essex) in a collaboration fomented by Microsoft AI for Health for analysing misinformation regarding Covid-19 on News Outlets.

Pattern

Pattern for extracted data - Google Drive

Basic naming convention for columns: 'URL', 'Date', 'Source', 'Categories', 'Search Terms', 'Text', 'Author', 'Country'. There may have more optional columns (as index) but only this will be used as of now.

Naming convention for folders (regarding the project MicrosoftEssexScrapers): results/NAMEOFSOURCE

Naming convention for the file collected (please note there may be only one file per source): "articles.csv.zip" or "articles.zip" regardless of content (be it an article or not).

Other project links

Real time classification of text

Scrapers and data from news outlets

Heroku github project

Link to LDA training file

LAMFO project page

Project event with the Univesity of Essex