Wikipedia Web Crawler written in Python and Scrapy. The ETL process involves multiple steps, extracting specific data from wikipedia's web page using scrapy and organizing it into a structured format using scrapy items. Additionally, the extracted data is saved in JSON format for further analysis and integration into MySQL Workbench. The JSON dataset serves as a potential data source for an API, enhancing data accessibility.
-
Notifications
You must be signed in to change notification settings - Fork 0
Wikipedia Web Crawler written in Python and Scrapy. The ETL process involves multiple steps, extracting specific data from multiple wikipedia web pages/links using scrapy and organizing it into a structured format using scrapy items. Additionally, the extracted data is saved in JSON format for further analysis and integration into MySQL Workbench.
License
WillCaton2350/Wikipedia-WebCrawler
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Wikipedia Web Crawler written in Python and Scrapy. The ETL process involves multiple steps, extracting specific data from multiple wikipedia web pages/links using scrapy and organizing it into a structured format using scrapy items. Additionally, the extracted data is saved in JSON format for further analysis and integration into MySQL Workbench.
Topics
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published