Skip to content

🌐 A CLI-based web crawler written in Go, designed to explore concurrency and efficient link traversal.

Notifications You must be signed in to change notification settings

heinrichb/bootdotdev_web-crawler-go

Repository files navigation

🌐 Web Crawler (Go)

This repository contains the Web Crawler project, developed as part of the Boot.dev course. The goal of this project is to create a CLI-based web crawler in Go, reinforcing key backend development concepts.

πŸš€ Features

  • Recursive Crawling: Traverse and fetch links recursively from web pages.
  • Concurrency: Leverage Go routines for efficient parallel crawling.
  • Custom Depth Control: Set limits on how deeply the crawler traverses links.
  • Error Handling: Gracefully manage timeouts and invalid URLs.
  • Output Summary: Present crawled URLs in a clear, readable format.

πŸ› οΈ Technologies Used

  • Go: Core language for development.
  • Concurrency: Using Go routines and channels.
  • CLI Design: Build and manage command-line interactions.
  • Testing: Robust unit tests with Go's testing package.

πŸ“š What I Learned

  • Implementing concurrency with Go routines and channels.
  • Parsing and managing HTML content in Go.
  • Error handling and timeouts in HTTP requests.
  • Designing effective CLI tools in Go.
  • Writing clean, maintainable, and testable Go code.

πŸ§ͺ Testing

Unit tests were written to verify the core functionality of the crawler, including:

  • Proper traversal of links.
  • Handling invalid or unreachable URLs.
  • Adhering to depth limits during recursion.

Run tests with:

go test ./...

🌟 Why This Project?

The Web Crawler project was built to deepen my understanding of backend development, specifically:

  • Gaining practical experience with Go's concurrency model.
  • Exploring the challenges of web scraping and crawling.
  • Building a scalable and efficient tool for recursive link traversal.

πŸ“‚ Project Structure

β”œβ”€β”€ crawler/         # Core crawler logic
β”œβ”€β”€ cmd/             # CLI implementation
β”œβ”€β”€ tests/           # Unit tests
└── README.md        # Project documentation

πŸ”— Related Resources


Feel free to explore, test, and contribute to this project! πŸš€

About

🌐 A CLI-based web crawler written in Go, designed to explore concurrency and efficient link traversal.

Topics

Resources

Stars

Watchers

Forks

Languages