Skip to content

This project contains files related to the crawler written to crawl Versus.com to find out the best headphones

Notifications You must be signed in to change notification settings

rkmittal/vs-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

vs-crawler

This project contains files related to the crawler written to crawl Versus.com to find out the best headphones.

I set out to do some research on Versus.com to find some good headphones below $100. On the Versus page (http://versus.com/en/headphone/top), the headphones are listed with their points in increasing order of ranking. But the problem is that there is no price mentioned on this page. To find out, I started clicking on the first few links and was quickly tired of doing so. It took a long time to open each page and many of them were quite expensive. So I thought there must be a better way.

I thought, this is a job for PHANTOM (phantom.js that is). So I wrote a crawler that can crawl and print all the links from the top page. I fed the first five pages to this crawler which gave me the Top 250 links. I removed all the unnecessary stuff and left only the links on each line. Now this line was fed to a Java source file that acted as the driver to my phantom.js crawler that went to each link and figured out the price, whether it is noise canceling head-phone or not and also marked the headphones cheaper than $100 with a --> in the results.

About

This project contains files related to the crawler written to crawl Versus.com to find out the best headphones

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published