🌐 Web Crawler (Go)

This repository contains the Web Crawler project, developed as part of the Boot.dev course. The goal of this project is to create a CLI-based web crawler in Go, reinforcing key backend development concepts.

🚀 Features

Recursive Crawling: Traverse and fetch links recursively from web pages.
Concurrency: Leverage Go routines for efficient parallel crawling.
Custom Depth Control: Set limits on how deeply the crawler traverses links.
Error Handling: Gracefully manage timeouts and invalid URLs.
Output Summary: Present crawled URLs in a clear, readable format.

🛠️ Technologies Used

Go: Core language for development.
Concurrency: Using Go routines and channels.
CLI Design: Build and manage command-line interactions.
Testing: Robust unit tests with Go's testing package.

📚 What I Learned

Implementing concurrency with Go routines and channels.
Parsing and managing HTML content in Go.
Error handling and timeouts in HTTP requests.
Designing effective CLI tools in Go.
Writing clean, maintainable, and testable Go code.

🧪 Testing

Unit tests were written to verify the core functionality of the crawler, including:

Proper traversal of links.
Handling invalid or unreachable URLs.
Adhering to depth limits during recursion.

Run tests with:

go test ./...

🌟 Why This Project?

The Web Crawler project was built to deepen my understanding of backend development, specifically:

Gaining practical experience with Go's concurrency model.
Exploring the challenges of web scraping and crawling.
Building a scalable and efficient tool for recursive link traversal.

📂 Project Structure

├── crawler/         # Core crawler logic
├── cmd/             # CLI implementation
├── tests/           # Unit tests
└── README.md        # Project documentation

🔗 Related Resources

Feel free to explore, test, and contribute to this project! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
config.go		config.go
crawl_page.go		crawl_page.go
crawl_page_test.go		crawl_page_test.go
get_html.go		get_html.go
get_urls_from_html.go		get_urls_from_html.go
get_urls_from_html_test.go		get_urls_from_html_test.go
go.mod		go.mod
go.sum		go.sum
main.go		main.go
normalize_url.go		normalize_url.go
normalize_url_test.go		normalize_url_test.go
print_report.go		print_report.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌐 Web Crawler (Go)

🚀 Features

🛠️ Technologies Used

📚 What I Learned

🧪 Testing

🌟 Why This Project?

📂 Project Structure

🔗 Related Resources

About

Contributors 2

Languages

heinrichb/bootdotdev_web-crawler-go

Folders and files

Latest commit

History

Repository files navigation

🌐 Web Crawler (Go)

🚀 Features

🛠️ Technologies Used

📚 What I Learned

🧪 Testing

🌟 Why This Project?

📂 Project Structure

🔗 Related Resources

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages