Skip to content

Dataset

Marco edited this page Jan 22, 2023 · 6 revisions

The dataset used in this project is scraped off the main italian real estate website. All the price are scraped during the timeframe 12/22 - 2/23


Description of the dataset

The filelds(labels) used for this project are:

  • Locali = int, the number of room in the house
  • Superficie = int, the total surface of the house
  • Bagni = int, the number of bathroom
  • Contratto = str, the type of contract(sell, rent, lease...)
  • Tipologia = str, the type of house(flat, mono-family, condo...)
  • Piano = int, the floor of the house
  • PostoAuto = int, the number of parking spot
  • target = int, the price of the house
  • SpeseCondominiali = int, monthly expenses
  • Stato = str, state of the building
  • EfficenzaEnergetica = int, energy efficency value
  • Riscaldamento = str, type of heating in the house
  • City = str, city name
  • AnnoDiCostruzione = str, year of construction
  • Climatizzazione = str, type of climatization

Preprocessing of the data

The dataset file is preprocessed by the script called DatasetPreprocessing.py, the main functions of the script are:

  1. Substitution of all the letters/symbols in the only numeric fields
  2. Checking for rows with foundamentals labels empty, and deleting them
  3. Filling the remainig rows empty labels by different method such as mean or Med
  4. Adding for each data the main activities near them, by calling the google map API and retrieving the fist two elements by distance
Clone this wiki locally