Skip to content

mingsheng36/focra

Repository files navigation

FOCRA

A Visual Cloud based Web Crawler built using Django 1.7, MongoDB 2.6.5 and Scrapy 0.24.4

alt text

##Features

  • Visually create your own XPath template
  • Toggle CSS and JavaScript on and off
  • Pagination Crawl
  • Chain Crawler (created from sublinks from initial crawler)
  • Pause/Resume Crawl
  • Show Hierarchy of Crawlers (how they are chained)
  • View Data in Pages
  • User Accounts
  • Export Data to Excel / CSV / JSON URL
  • Improve on Algorithms (Aggregation + Alignment)
  • Schedule Crawl Frequency
  • Crawl JavaScript Pages (Get from XHR request)
  • Appending of Data (Latest data appending)
  • Modify Field Names, Column Position and Template of the Crawler
  • Change Database architecture (Not scalable as it uses one collection per crawler)
  • Monitor Performance of Crawler
  • Django Push Events (currently using poll)

###Things to Note

  • Internet Explorer not supported.
  • There is a download delay of 2 seconds to avoid affecting other servers.

About

A Visual Cloud Web Forum Crawler/Scraper

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published