Skip to content

Latest commit

 

History

History
39 lines (27 loc) · 2.08 KB

README.md

File metadata and controls

39 lines (27 loc) · 2.08 KB

Unit 1: Scraping the Web with Scrapy

This unit covers the basics of web scraping with a special focus on data extraction with Scrapy.

Topics

  • The anatomy of a Scrapy Spider
  • Building a simple spider
  • Web scraping with Scrapy & CSS

Check out the slides for this unit

Sample Spiders

  1. Spider that saves 2 pages from quotes.toscrape.com to the disk:
  2. Spider that scrapes quotes.toscrapes.com:

Hands-on

1. Books spider

Build a spider for books.toscrape.com that extracts title, rating, price, stock and category from the URLs listed in this file (it can be stored locally alongside your spider).

Check out the spider once you're done.

2. Reddit spider

Build a spider to extract title, link, username, user_url, score and time from each submission in the front page of reddit's /r/programming and /r/python.

Check out the spider once you're done.

References