EU Commission Grant Exploratory Research: Polish Discourse on Social Media on Air Pollution

Background

As the Citizen Science Research Intern at Caravan Studios, I was tasked to explore the air pollution discourse online for Poland, the EU country with the worst air pollution record. My aim was to 1) gain a temporal and spatial understanding of the interest about air pollution in Poland and 2) have a sense of the popular discourse about air pollution. To expand on the latter: how did Poles express themselves about air pollution (whom did they blame, how did they describe the impact of pollution on daily life etc.).

I investigated different social media sources, including: Twitter, Facebook, Reddit, Wykop, NK.pl, and Google Trends. Here's the summary of my outputs per sources below:

Google Trends: I was able to generate charts showing browsing frequency over the last 5 years, using a set of keywords relevant to the subject of air pollution. Additionally I was able to compare the frequency of those searches between the whole of Poland and the southwestern region of Lesser Poland, known to be the region heavily impacted by smog.
Facebook: Due to changes in Facebook's API, I was unable to query for meaningful content. I considered a web scraping approach but Facebook browsing features limits to 5 posts over a narrow time period. Thus I dropped the site from my inquiries.
Reddit: I made a post on r/Polska, the Polish Subreddit, soliciting inputs about air pollutions from Polish individuals directly. I was able to gather over half a dozen responses. The objective of my sollicitation was to inform my research process and use local testimonies to cross-examine with news story and subsequent data I gathered.
NK.pl: A Polish forum site, which I rapidly found to be lacking in the content I was researching. Futhermore a poster on r/Polska kindly informed me that "NK.pl is dead".
Twitter Analytics: Using the Twitter's API, I queried Tweets from the last 7 days (a limitation of the public access API) geotagged to Poland with the mention of smog. Although the scope of temporal granularity is quite limited, the API nonetheless provides a snapshot of the "current discourse". I was able to both generate a timeseries and word clouds from the Tweets collected.
Wykop: The "Polish Reddit". Wykop was my richest source of information: I was able to scrape over 7,000 posts, dating as far back to 2012, mentioning smog. I was able to generate a timeseries of the frequency of the posts, in addition to a word cloud.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.ipynb_checkpoints		.ipynb_checkpoints
img		img
outputs		outputs
raw		raw
.gitignore		.gitignore
README.md		README.md
Report_AirPollutionDigitalDiscourse_Poland.ipynb		Report_AirPollutionDigitalDiscourse_Poland.ipynb
airQualityInterestInPoland.ipynb		airQualityInterestInPoland.ipynb
poland_GTrends_TimeS.html		poland_GTrends_TimeS.html
wykop_Web Scraping with Python and BeautifulSoup.ipynb		wykop_Web Scraping with Python and BeautifulSoup.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EU Commission Grant Exploratory Research: Polish Discourse on Social Media on Air Pollution

Background

Method: Scraping all posts mentioning 'smog' on Wykop.pl

About

Releases

Packages

Languages

HP-Nunes/caravanStudios_webScraping4LocalKnowledge

Folders and files

Latest commit

History

Repository files navigation

EU Commission Grant Exploratory Research: Polish Discourse on Social Media on Air Pollution

Background

Method: Scraping all posts mentioning 'smog' on Wykop.pl

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages