-
Notifications
You must be signed in to change notification settings - Fork 7
Home
Welcome to the BarkingOwl wiki!
###About###
BarkingOwl is a set of tools packaged in a library that focus on finding different document types on websites (such as PDFs, DOC, XLS, TXT, HTML, etc). The library is made up of two primary parts: the Scraper and the Dispatcher.
BarkingOwl uses linmagic to type files.
###Implementation Details###
The BarkingOwl Scraper is the core of the system, and does most of the hard work. There is an extension to the Scraper called the ScraperWrapper that allows for the Scraper to broadcast messages to a AMQP bus. The Scraper can be used as a stand-alone tool, or can be used via the ScraperWrapper in a message bus topology.
The Dispatcher takes in a list of URLs and dispatches them to available Scrapers waiting on the AMQP bus. The dispatcher can run in a number of modes including 'broadcast all once' and 'broadcast each at an interval'.
The Bus Access portion of BarkingOwl allows for the programmer to interface with any part of the system using the same AMQP bus that the the dispatcher and ScraperWrapper use.