WebScrap with Scrapy

This project is used to scrap match data from 'https://www.goal.com/en-my/live-scores'. , but you can use this program to scrap any site as long as you replace the urls correctly

You can use the following command to open the shell and start scrapping straight away

$ scrapy shell ('url_comes_here')

Type the following commands to get the following data

$ response.css('h3::text').getall()                             ---to get all the h3 tags in the file
$ response.css('.div_class_name html tag like <a>::text').get() ---gives the first a attributes in the given class
$ response.css('p::text').re(r'(\w+) any_word (\w+)')           ---searches any_word in the whole doc

$ response.xpath('xpath_here').extract ---to get elements in x path

Type the following commands to get the data from specific div

$ post = response.css('div_class_name ')[0]             --- to get div in variable post
$ name = response.css('div_class_name a::text')[0]      --- to get div attributes in variable name

$ for post in response.css('div_class_name'):
    title= post.css('.post-header h2 a::text')[0].get();        --- to get all the title in whole div
    print(dict(title=title))                                    --- to print whole dictionary of data

$ quit()--- to quit the shell

Apart from shell you can also use the scraper/scraper/spiders/properties_spider.py to scrap the data from the python file

$ scrapy crawl posts

You can also convert the data to JSON file using following command

$ scrapy crawl posts -0 posts.json

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.idea		.idea
env		env
scraper		scraper
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebScrap with Scrapy

About

Releases

Packages

Languages

ayaan278/WebScrap

Folders and files

Latest commit

History

Repository files navigation

WebScrap with Scrapy

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages