HTTP, SOCKS4, SOCKS5 proxies scraper and checker.
- Asynchronous.
- Uses regex to search for proxies (ip:port format) on a web page, allowing proxies to be extracted even from json without making changes to the code.
- It is possible to specify the URL to which to send a request to check the proxy.
- Can sort proxies by speed.
- Supports determining the geolocation of the proxy exit node.
- Can determine if the proxy is anonymous.
You can get proxies obtained using this script in monosans/proxy-list.
-
Download and unpack the archive with the program.
-
Edit
config.ini
according to your preference. -
Install Python (minimum supported version is 3.7). During installation, be sure to check the box
Add Python to PATH
. -
Install dependencies and run the script. There are 2 ways to do this:
-
Automatic:
- On Windows run
start.cmd
- On Unix-like OS run
start.sh
- On Windows run
-
Manual:
Windows (click to expand)
-
cd
into the unpacked folder -
Install dependencies with the command:
py -m pip install -U --no-cache-dir --disable-pip-version-check pip setuptools wheel; py -m pip install -U --no-cache-dir --disable-pip-version-check -r requirements.txt
-
Run with the command:
py -m proxy_scraper_checker
Unix-like OS (click to expand)
-
cd
into the unpacked folder -
Install dependencies with the command:
python3 -m pip install -U --no-cache-dir --disable-pip-version-check pip setuptools wheel && python3 -m pip install -U --no-cache-dir --disable-pip-version-check -r requirements.txt
-
Run with the command:
python3 -m proxy_scraper_checker
-
-
When the script finishes running, the following folders will be created (this behavior can be changed in the config):
proxies
- proxies with any anonymity level.proxies_anonymous
- anonymous proxies.proxies_geolocation
- same asproxies
, but includes exit-node's geolocation.proxies_geolocation_anonymous
- same asproxies_anonymous
, but includes exit-node's geolocation.
Geolocation format is ip:port|Country|Region|City
.