-
Notifications
You must be signed in to change notification settings - Fork 0
Dataset
Marco edited this page Jan 22, 2023
·
6 revisions
The dataset used in this project is scraped off the main italian real estate website. All the price are scraped during the timeframe 12/22 - 2/23
The filelds(labels) used for this project are:
- Locali = int, the number of room in the house
- Superficie = int, the total surface of the house
- Bagni = int, the number of bathroom
- Contratto = str, the type of contract(sell, rent, lease...)
- Tipologia = str, the type of house(flat, mono-family, condo...)
- Piano = int, the floor of the house
- PostoAuto = int, the number of parking spot
- target = int, the price of the house
- SpeseCondominiali = int, monthly expenses
- Stato = str, state of the building
- EfficenzaEnergetica = int, energy efficency value
- Riscaldamento = str, type of heating in the house
- City = str, city name
- AnnoDiCostruzione = str, year of construction
- Climatizzazione = str, type of climatization
The dataset file is preprocessed by the script called DatasetPreprocessing.py, the main functions of the script are:
- Substitution of all the letters/symbols in the only numeric fields
- Checking for rows with foundamentals labels empty, and deleting them
- Filling the remainig rows empty labels by different method such as mean or Med
- Adding for each data the main activities near them, by calling the google map API and retrieving the fist two elements by distance