Arachnida is a two-phase project focusing on web scraping and metadata analysis.
The project includes the following sections:
- Spider: A program that recursively downloads images from websites.
- Scorpion: A program that parses and displays metadata from image files.
This project aims to provide practical experience in web data extraction and working with metadata.
- Clone the repository:
git clone https://github.com/whymami/Arachnida
cd Arachnida
- Recursively downloads images from a specified URL.
- Supported image formats:
.jpg
,.jpeg
,.png
,.gif
,.bmp
. - Program Options:
-r
: Enables recursive downloading.-r -l [N]
: Sets recursive download depth (default: 5).-p [PATH]
: Sets the directory for downloaded files (default:./data/
).-h
: Provides detailed information about parameters.
python3 Spider.py [-r] [-l N] [-p PATH] URL
Analyzes and displays metadata from image files:
- Creation date.
- EXIF data.
- Compatible with the same formats as Spider.
./python3 Scorpion.py FILE1 [FILE2 ...]
- Edit or delete metadata in image files.
- Graphical interface for metadata management.
cd bonus
python3 main.py
.
├── Spider.py # Spider program
├── Scorpion.py # Scorpion program
├── data/ # Default directory for files downloaded by Spider
├── bonus/ # Bonus section
│ ├── Scorpion.py # Scorpion program for bonus
│ ├── main.py # Bonus main file
│ └── gui.py # Graphical interface file
│ └── requirements.txt # Requirements file
└── README.md # Project documentation