Webcrawler

`Challenge`

In a language of your choice, implement a simple web crawler that gets a news website as input (e.g. http://www.spiegel.de) and crawls the HTML content of up to 100 pages of that site with a breadth-first approach. The downloaded pages should be stored as HTML in a folder in the file system. The crawler needs to be able to work with up to 50 parallel processes. The number of processes can be passed as a parameter. If no input is given the default value shall be 5 processes.

`Language Used`

PHP

`Solution Installation`

Clone this repository: git clone https://github.com/Oyelamin/Webcrawler.git
Install the dependencies: composer install

Now You can run the code for use

`Solution Usage`

Firstly, you can either choose to run the solution in the index.php file or inject in your php application as long as composer is installed, you are good to go!.
You can initialise the Crawl class but you need to import it into your file or controller class. e.g:

### *`use WebCrawler\Crawl;`*
3. Declare your basic inputs. e.g:

    $websiteUrl = 'http://www.spiegel.de'; // Any url of your choice - Required<br>
    $maxPages = 10; // This can increase/decrease - Optional<br>
    $maxProcesses = 5; // Can increase/decrease - Optional<br>
    $folderName = "MyCustompages"; // - Optional<br>
    $fileExtension = "html"; // txt,htm,css, etc... - Optional

Feed your declared inputs in the Crawl class as initialization:

$crawl = new Crawl($websiteUrl, $maxPages, $maxProcesses, $folderName, $fileExtension);
Execute the program:

return $crawl->execute(); // run { php index.php }

`Example`

This is example of how you can run it:

<?php

require 'vendor/autoload.php';

use WebCrawler\Crawl;

$websiteUrl = 'http://www.spiegel.de'; // Any url of your choice - Required
$maxPages = 10; // This can increase/decrease - Optional
$maxProcesses = 5; // Can increase/decrease - Optional
$folderName = "MyCustompages"; // - Optional
$fileExtension = "html"; // txt,htm,css, etc... - Optional
$crawl = new Crawl($websiteUrl, $maxPages, $maxProcesses, $folderName, $fileExtension);

return $crawl->execute(); // run { php index.php } to execute

// THANK YOU and i hope you enjoyed the code ❤ 🤗!!!!!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
composer.json		composer.json
composer.lock		composer.lock
index.php		index.php

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Webcrawler

`Challenge`

`Language Used`

`Solution Installation`

`Solution Usage`

`Example`

About

Releases

Packages

Languages

Oyelamin/Webcrawler

Folders and files

Latest commit

History

Repository files navigation

Webcrawler

Challenge

Language Used

Solution Installation

Solution Usage

Example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`Challenge`

`Language Used`

`Solution Installation`

`Solution Usage`

`Example`

Packages