Use the software provided at your own risk. I cannot be held responsible for any potential consequences, including any potential damages.
This open-source program uses Python to scrape data from Njuskalo.hr. The program uses Playwright to navigate Njuskalo and BeautifulSoup to parse the HTML and extract relevant data. It then saves the data in json format inside the directory of your choosing.
You can scrape any category you choose, or whole tabs inside njuskalo (Nekretnine, Auto-Moto, etc...)
1)Clone the repository
2)Navigate to the repository in your terminal
3)Run:
pip install -r requirements.txt
4)Run the program with
python main.py
{
"name": "ADVERT NAME",
"location": "LOCATION DATA, KILOMETERS, YEAR OF CAR" ,
"time": "DATE POSTED",
"price": "PRICE"
},
- Python 3.x
- Playwright
- Streamlit
- BeautifulSoup
- Playwright for web crawling
- BeautifulSoup for HTML parsing
- JSON for data formatting