This repository contains the Web Crawler project, developed as part of the Boot.dev course. The goal of this project is to create a CLI-based web crawler in Go, reinforcing key backend development concepts.
- Recursive Crawling: Traverse and fetch links recursively from web pages.
- Concurrency: Leverage Go routines for efficient parallel crawling.
- Custom Depth Control: Set limits on how deeply the crawler traverses links.
- Error Handling: Gracefully manage timeouts and invalid URLs.
- Output Summary: Present crawled URLs in a clear, readable format.
- Go: Core language for development.
- Concurrency: Using Go routines and channels.
- CLI Design: Build and manage command-line interactions.
- Testing: Robust unit tests with Go's
testing
package.
- Implementing concurrency with Go routines and channels.
- Parsing and managing HTML content in Go.
- Error handling and timeouts in HTTP requests.
- Designing effective CLI tools in Go.
- Writing clean, maintainable, and testable Go code.
Unit tests were written to verify the core functionality of the crawler, including:
- Proper traversal of links.
- Handling invalid or unreachable URLs.
- Adhering to depth limits during recursion.
Run tests with:
go test ./...
The Web Crawler project was built to deepen my understanding of backend development, specifically:
- Gaining practical experience with Go's concurrency model.
- Exploring the challenges of web scraping and crawling.
- Building a scalable and efficient tool for recursive link traversal.
βββ crawler/ # Core crawler logic
βββ cmd/ # CLI implementation
βββ tests/ # Unit tests
βββ README.md # Project documentation
Feel free to explore, test, and contribute to this project! π