diff --git a/.gitpod.dockerfile b/.gitpod.dockerfile deleted file mode 100644 index e13c885..0000000 --- a/.gitpod.dockerfile +++ /dev/null @@ -1,3 +0,0 @@ -FROM gitpod/workspace-full - -RUN npm i learnpack -g diff --git a/.vscode/settings.json b/.vscode/settings.json index b2ac17b..cc85e3f 100644 --- a/.vscode/settings.json +++ b/.vscode/settings.json @@ -1,6 +1,9 @@ { - "editor.defaultFormatter": "esbenp.prettier-vscode", - "workbench.editorAssociations": { - "*.md": "vscode.markdown.preview.editor" - } -} \ No newline at end of file + "editor.defaultFormatter": "esbenp.prettier-vscode", + "workbench.editorAssociations": { + "*.md": "vscode.markdown.preview.editor" + }, + "files.autoSave": "afterDelay", + "files.autoSaveDelay": 700, + "editor.minimap.enabled": false +} diff --git a/assets/preview.jpeg b/assets/preview.jpeg deleted file mode 100644 index 4302f6e..0000000 Binary files a/assets/preview.jpeg and /dev/null differ diff --git a/learn.json b/learn.json index 5c68fdc..00f3161 100644 --- a/learn.json +++ b/learn.json @@ -1,10 +1,10 @@ { "language": "python3", - "slug": "realestate-datacleanup-exercise", - "title": "Real Estate Data Cleanup", - "repository": "https://github.com/4GeeksAcademy/realstate-datacleanup-exercise", - "preview": "https://github.com/4GeeksAcademy/realestate-datacleanup-exercise/blob/main/assets/preview.jpeg?raw=true", - "description": "Prepare a real dataset to later train a machine learning model", + "slug": "final-data-science-prework-project", + "title": "Final data science prework project", + "repository": "https://github.com/4GeeksAcademy/final-data-science-prework-project", + "preview": "https://github.com/4GeeksAcademy/final-data-science-prework-project/blob/main/assets/preview.jpeg?raw=true", + "description": "This project is designed to teach how to handle real-world data in a basic data science workflow, such as cleaning, processing, and visualizing a dataset using key tools like Pandas.", "duration": 3, "difficulty": "easy", "projectType": "project", diff --git a/project.es.ipynb b/project.es.ipynb index da1f12e..f9c7709 100644 --- a/project.es.ipynb +++ b/project.es.ipynb @@ -1,29 +1,134 @@ { "cells": [ { - "attachments": {}, "cell_type": "markdown", - "id": "innocent-university", + "id": "66fac57f", "metadata": {}, "source": [ - "# Limpieza de bienes raíces\n", + "# Prácticas básicas con Python\n", "\n", - "Este es un conjunto de datos (dataset) reales que fue descargado usando técnicas de web scraping. La data contiene registros de **Fotocasa**, el cual es uno de los sitios más populares de bienes raíces en España. Por favor no hagas esto (web scraping) a no ser que sea para propósitos académicos.\n", + "#### Ejercicio 00. Declaración de Variables\n", "\n", - "El dataset fue descargado hace algunos años por Henry Navarro y en ningún caso se obtuvo beneficio económico de ello.\n", + "Define las siguientes variables con valores que tú elijas:\n", "\n", - "Contiene miles de datos de casas reales publicadas en la web www.fotocasa.com. Tu objetivo es extraer tanta información como sea posible con el conocimiento que tienes hasta ahora de ciencia de datos, por ejemplo ¿cuál es la casa más cara en todo el dataset?\n", + "- Una variable que almacene tu nombre (texto).\n", + "- Una variable que almacene tu edad (número entero).\n", + "- Una variable que indique si te gusta la programación (verdadero o falso).\n", + "- Una variable que almacene tu calificación promedio (número decimal).\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4eca514d", + "metadata": {}, + "outputs": [], + "source": [ + "# Declara aquí tus variables.\n", + "\n", + "#Ejemplo\n", + "saludo = \"Hola mundo\" " + ] + }, + { + "cell_type": "markdown", + "id": "21e43590", + "metadata": {}, + "source": [ + "- Crea una lista con tus cinco números favoritos e imprimelo." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "69f9e072", + "metadata": {}, + "outputs": [], + "source": [ + "# Lista de números favoritos" + ] + }, + { + "cell_type": "markdown", + "id": "9946452f", + "metadata": {}, + "source": [ + "- Crea un diccionario que guarde la información de un estudiante e imprimelo:\n", + "\n", + " - Nombre\n", + " - Edad\n", + " - Calificación final" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0d30fb44", + "metadata": {}, + "outputs": [], + "source": [ + "# Diccionario del estudiante" + ] + }, + { + "cell_type": "markdown", + "id": "91eb860d", + "metadata": {}, + "source": [ + "#### Ejercicio 01. Análisis de datos básico con estructuras nativas de python.\n", + "Crea una lista con las calificaciones de 5 estudiantes: [8.5, 9.2, 7.8, 8.9, 10].\n", "\n", - "Empecemos precisamente con esa pregunta... ¡Buena suerte!" + "- Calcula el promedio de las calificaciones." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "41b747dc", + "metadata": {}, + "outputs": [], + "source": [ + "# Código" + ] + }, + { + "cell_type": "markdown", + "id": "699ac3a6", + "metadata": {}, + "source": [ + "- Encuentra la calificación más alta y la más baja." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0224afb4", + "metadata": {}, + "outputs": [], + "source": [ + "# Código" ] }, { "attachments": {}, "cell_type": "markdown", - "id": "multiple-glass", + "id": "innocent-university", "metadata": {}, "source": [ - "#### Ejercicio 00. Lee el dataset assets/real_estate.csv e intenta visualizar la tabla (★☆☆)" + "# Limpieza de bienes raíces con Pandas para un análisis eficiente \n", + "\n", + "Este es un conjunto de datos (dataset) reales que fue descargado usando técnicas de web scraping. La data contiene registros de **Fotocasa**, el cual es uno de los sitios más populares de bienes raíces en España. Por favor no hagas esto (web scraping) a no ser que sea para propósitos académicos.\n", + "\n", + "El dataset fue descargado hace algunos años por Henry Navarro y en ningún caso se obtuvo beneficio económico de ello.\n", + "\n", + "Contiene miles de datos de casas reales publicadas en la web www.fotocasa.com. Tu objetivo es extraer tanta información como sea posible con el conocimiento que tienes hasta ahora de ciencia de datos.\n", + "\n", + "¡Comencemos!\n", + "\n", + "\n", + "\n", + "\n", + "- Primero leamos y exploraremos el dataset" ] }, { @@ -423,9 +528,35 @@ "source": [ "import pandas as pd\n", "\n", - "# Este archivo CSV contiene puntos y comas en lugar de comas como separadores\n", - "ds = pd.read_csv('assets/real_estate.csv', sep=';')\n", - "ds" + "# Leer el archivo CSV\n", + "ds = pd.read_csv('assets/real_estate.csv', sep=';') # Este archivo CSV contiene puntos y comas en lugar de comas como separadores\n", + "ds # mostramos todo" + ] + }, + { + "cell_type": "markdown", + "id": "19bc6aa8", + "metadata": {}, + "source": [ + "- Muestra las primeras filas del archivo CSV" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "93434fb6", + "metadata": {}, + "outputs": [], + "source": [ + "# Mostrar las primeras filas" + ] + }, + { + "cell_type": "markdown", + "id": "a1095c6b", + "metadata": {}, + "source": [ + "Perfecto, esto fue una pequeña practica. ¡Ahora empecemos con los verdaderos ejercicios!" ] }, { @@ -459,7 +590,7 @@ "source": [ "#### Ejercicio 02. ¿Cuál es la casa más barata del dataset? (★☆☆)\n", "\n", - "Imprime la dirección y el precio de la casa seleccionada. Por ejemplo:\n", + "Este ejercicio es similar al ejercicio anterior, solo que ahora buscamos la casa con el precio más bajo. Recuerda imprimir la dirección y el precio de la casa seleccionada. Por ejemplo:\n", "\n", "`La casa con dirección en Calle Alcalá, Nº58 es la más barata y su precio es de 12000 USD`" ] @@ -486,7 +617,9 @@ "\n", "`La casa más grande está ubicada en Calle Gran Vía, Nº38 y su superficie es de 5000 metros`\n", "\n", - "`La casa más pequeña está ubicada en Calle Mayor, Nº12 y su superficie es de 200 metros`" + "`La casa más pequeña está ubicada en Calle Mayor, Nº12 y su superficie es de 200 metros`\n", + "\n", + "Este ejercicio es similar al anterior, pero buscamos las casas más grandes y pequeñas en base a su superficie." ] }, { @@ -505,9 +638,9 @@ "id": "danish-spirit", "metadata": {}, "source": [ - "#### Ejercicio 04. ¿Cuantas poblaciones (columna level5) contiene el dataset? (★☆☆)\n", + "#### Ejercicio 04. ¿Cuantas poblaciones contiene el dataset? (★☆☆)\n", "\n", - "Imprime el nombre de las poblaciones separadas por coma. Por ejemplo:\n", + "Cuenta el número de poblaciones únicas en la columna 'level5' e imprime el nombre de las poblaciones separadas por coma. Por ejemplo:\n", "\n", "`> print(populations)`\n", "\n", @@ -530,9 +663,9 @@ "id": "crazy-blame", "metadata": {}, "source": [ - "#### Ejercicio 05. ¿El dataset contiene valores no admitidos (NAs)? (★☆☆)\n", + "#### Ejercicio 05. ¿El dataset contiene valores nulos (NAs)? (★☆☆)\n", "\n", - "Imprima un booleano (`True` o `False`) seguido de la fila/columna que contiene el NAs." + "Imprima un booleano (`True` o `False`) para verificar si hay valores nulos seguido de las columnas que contiene el NAs." ] }, { @@ -551,9 +684,9 @@ "id": "italic-hydrogen", "metadata": {}, "source": [ - "#### Ejercicio 06. Elimina los NAs del dataset, si aplica (★★☆)\n", + "#### Ejercicio 06. Elimina los valores nulos (NAs) del dataset, si aplica (★★☆)\n", "\n", - "Imprima una comparación entre las dimensiones del DataFrame original versus el DataFrame después de las eliminaciones.\n" + "Despues de eliminar los valores nulos, compara el tamaño del DataFrame antes y después de la eliminación.\n" ] }, { @@ -572,9 +705,9 @@ "id": "middle-china", "metadata": {}, "source": [ - "#### Ejercicio 07. ¿Cuál la media de precios en la población (columna level5) de \"Arroyomolinos (Madrid)\"? (★★☆)\n", + "#### Ejercicio 07. ¿Cuál es la media de precios en la población de \"Arroyomolinos (Madrid)\"? (★★☆)\n", "\n", - "Imprima el valor obtenido." + "Imprima el valor obtenido de la columna level5." ] }, { @@ -593,7 +726,7 @@ "id": "concerned-radical", "metadata": {}, "source": [ - "#### Ejercicio 08. Trazar el histograma de los precios para la población (level5 column) de \"Arroyomolinos (Madrid)\" y explica qué observas (★★☆)\n", + "#### Ejercicio 08. Trazar el histograma de los precios para la población de \"Arroyomolinos (Madrid)\" y explica qué observas (★★☆)\n", "\n", "Imprime el histograma de los precios y escribe en la celda del Markdown un breve análisis del trazado.\n" ] @@ -605,338 +738,19 @@ "metadata": {}, "outputs": [], "source": [ - "# TODO: Code" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "impressed-combination", - "metadata": {}, - "source": [ - "**TODO: Markdown**. Para escribir aquí, haz doble clic en esta celda, elimina este contenido y coloca lo que quieras escribir. Luego ejecuta la celda." - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "actual-edinburgh", - "metadata": {}, - "source": [ - "#### Ejercicio 09. ¿Son los precios promedios de \"Valdemorillo\" y \"Galapagar\" los mismos? (★★☆)\n", + "import matplotlib.pyplot as plt\n", "\n", - "Imprime ambos promedios y escribe una conclusión sobre ellos." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "numeric-commerce", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" + "# Trazar el histograma de precios" ] }, { "attachments": {}, "cell_type": "markdown", - "id": "lonely-article", - "metadata": {}, - "source": [ - "#### Ejercicio 10. ¿Son los promedios de precio por metro cuadrado (precio/m2) de \"Valdemorillo\" y \"Galapagar\" los mismos? (★★☆)\n", - "\n", - "Imprime ambos promedios de precio por metro cuadrado y escribe una conclusión sobre ellos.\n", - "\n", - "Pista: Crea una nueva columna llamada `pps` (*price per square* o precio por metro cuadrado) y luego analiza los valores." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "hourly-globe", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "pleasant-invite", - "metadata": {}, - "source": [ - "#### Ejercicio 11. Analiza la relación entre la superficie y el precio de las casas. (★★☆)\n", - "\n", - "Pista: Puedes hacer un `scatter plot` y luego escribir una conclusión al respecto." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "common-drilling", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO: Código" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "ahead-liquid", - "metadata": {}, - "source": [ - "**TODO: Markdown**. Para escribir aquí, haz doble clic en esta celda, elimina este contenido y coloca lo que quieras escribir. Luego ejecuta la celda." - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "coordinate-sunrise", - "metadata": {}, - "source": [ - "#### Ejercicio 12. ¿Cuántas agencia de bienes raíces contiene el dataset? (★★☆)\n", - "\n", - "Imprime el valor obtenido." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "valid-honolulu", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "binding-ebony", - "metadata": {}, - "source": [ - "#### Ejercicio 13. ¿Cuál es la población (columna level5) que contiene la mayor cantidad de casas?(★★☆)\n", - "\n", - "Imprima la población y el número de casas." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "static-perry", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "entire-classification", - "metadata": {}, - "source": [ - "#### Ejercicio 14. Ahora vamos a trabajar con el \"cinturón sur\" de Madrid. Haz un subconjunto del DataFrame original que contenga las siguientes poblaciones (columna level5): \"Fuenlabrada\", \"Leganés\", \"Getafe\", \"Alcorcón\" (★★☆)\n", - "\n", - "Pista: Filtra el DataFrame original usando la columna `level5` y la función `isin`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "binary-input", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "severe-fisher", - "metadata": {}, - "source": [ - "#### Ejercicio 15. Traza un gráfico de barras de la mediana de los precios y explica lo que observas (debes usar el subconjunto obtenido del Ejercicio 14) (★★★)\n", - "\n", - "Imprima un gráfico de barras de la mediana de precios y escriba en la celda Markdown un breve análisis sobre el gráfico." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "lyric-bunch", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO: Code" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "sublime-newspaper", - "metadata": {}, - "source": [ - "**TODO: Markdown**. Para escribir aquí, haz doble clic en esta celda, elimina este contenido y coloca lo que quieras escribir. Luego ejecuta la celda." - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "speaking-diamond", - "metadata": {}, - "source": [ - "#### Ejercicio 16. Calcula la media y la varianza de muestra para las siguientes variables: precio, habitaciones, superficie y baños (debes usar el subconjunto obtenido del Ejercicio 14) (★★★)\n", - "\n", - "Imprime ambos valores por cada variable." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "random-feeling", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "revolutionary-matrix", - "metadata": {}, - "source": [ - "#### Ejercicio 17. ¿Cuál es la casa más cara de cada población? Debes usar el subset obtenido en la pregunta 14 (★★☆)\n", - "\n", - "Imprime tanto la dirección como el precio de la casa seleccionada de cada población. Puedes imprimir un DataFrame o una sola línea para cada población." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "fifteen-browse", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "activated-knight", - "metadata": {}, - "source": [ - "#### Ejercicio 18. Normaliza la variable de precios para cada población y traza los 4 histogramas en el mismo gráfico (debes usar el subconjunto obtenido en la pregunta 14) (★★★)\n", - "\n", - "Para el método de normalización, puedes usar el que consideres adecuado, no hay una única respuesta correcta para esta pregunta. Imprime el gráfico y escribe en la celda de Markdown un breve análisis sobre el gráfico.\n", - "\n", - "Pista: Puedes ayudarte revisando la demostración multihist de Matplotlib." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "civic-meditation", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "precise-heavy", + "id": "impressed-combination", "metadata": {}, "source": [ "**TODO: Markdown**. Para escribir aquí, haz doble clic en esta celda, elimina este contenido y coloca lo que quieras escribir. Luego ejecuta la celda." ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "patent-jonathan", - "metadata": {}, - "source": [ - "#### Ejercicio 19. ¿Qué puedes decir sobre el precio por metro cuadrado (precio/m2) entre los municipios de 'Getafe' y 'Alcorcón'? Debes usar el subconjunto obtenido en la pregunta 14 (★★☆)\n", - "\n", - "Pista: Crea una nueva columna llamada `pps` (price per square en inglés) y luego analiza los valores" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "initial-liverpool", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "enhanced-moscow", - "metadata": {}, - "source": [ - "#### Ejercicio 20. Realiza el mismo gráfico para 4 poblaciones diferentes (columna level5) y colócalos en el mismo gráfico. Debes usar el subconjunto obtenido en la pregunta 14 (★★☆) \n", - "Pista: Haz un diagrama de dispersión de cada población usando subgráficos (subplots)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "accepting-airfare", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "blocked-effects", - "metadata": {}, - "source": [ - "#### Ejercicio 21. Realiza un trazado de las coordenadas (columnas latitud y longitud) del cinturón sur de Madrid por color de cada población (debes usar el subconjunto obtenido del Ejercicio 14) (★★★★)\n", - "\n", - "Ejecuta la siguiente celda y luego comienza a codear en la siguiente. Debes implementar un código simple que transforme las columnas de coordenadas en un diccionario de Python (agrega más información si es necesario) y agrégala al mapa." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "headed-privacy", - "metadata": {}, - "outputs": [], - "source": [ - "from ipyleaflet import Map, basemaps\n", - "\n", - "# Mapa centrado en (60 grados latitud y -2.2 grados longitud)\n", - "# Latitud, longitud\n", - "map = Map(center = (60, -2.2), zoom = 2, min_zoom = 1, max_zoom = 20, \n", - " basemap=basemaps.Stamen.Terrain)\n", - "map" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "present-mistress", - "metadata": {}, - "outputs": [], - "source": [ - "## Aquí: traza la coordenadas de los estados\n", - "\n", - "## PON TU CÓDIGO AQUÍ:\n" - ] } ], "metadata": { diff --git a/project.ipynb b/project.ipynb deleted file mode 100644 index 5dfd1e5..0000000 --- a/project.ipynb +++ /dev/null @@ -1,964 +0,0 @@ -{ - "cells": [ - { - "attachments": {}, - "cell_type": "markdown", - "id": "innocent-university", - "metadata": {}, - "source": [ - "# Real Estate Clean up\n", - "\n", - "This is a real dataset, and it was downloaded using web scraping techniques. The data contains registers from **Fotocasa** which is one of the most popular real estate websites in Spain. Please, do not do this (web scraping) unless it is for academic purposes.\n", - "\n", - "The dataset was downloaded a few years ago by Henry Navarro, and in no case were economic returns obtained from it.\n", - "\n", - "It contains thousands of data from real houses published on the web www.fotocasa.com. Your goal is to extract as much information as possible with the knowledge you have so far about data science, for example what is the most expensive house in the entire dataset?\n", - "\n", - "Let's start with precisely that question... Good luck!" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "multiple-glass", - "metadata": {}, - "source": [ - "#### Exercise 00. Read the dataset assets/real_estate.csv and try to visualize the table (★☆☆)" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "id": "frank-heath", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
Unnamed: 0id_realEstatesisNewrealEstate_namephone_realEstateurl_inmuebleroomsbathroomssurfaceprice...level4Idlevel5Idlevel6Idlevel7Idlevel8IdaccuracylatitudelongitudezipCodecustomZone
01153771986Falseferrari 57 inmobiliaria912177526.0https://www.fotocasa.es/es/comprar/vivienda/ma...3.02.0103.0195000...00000040,2948276786438-3,44402412135624NaNNaN
12153867863Falsetecnocasa fuenlabrada ferrocarril916358736.0https://www.fotocasa.es/es/comprar/vivienda/ma...3.01.0NaN89000...00000140,28674-3,79351NaNNaN
23153430440Falselook find boadilla916350408.0https://www.fotocasa.es/es/comprar/vivienda/ma...2.02.099.0390000...00000040,4115646786438-3,90662252135624NaNNaN
34152776331Falsetecnocasa fuenlabrada ferrocarril916358736.0https://www.fotocasa.es/es/comprar/vivienda/ma...3.01.086.089000...00000040,2853785786438-3,79508142135624NaNNaN
45153180188Falseferrari 57 inmobiliaria912177526.0https://www.fotocasa.es/es/comprar/vivienda/ma...2.02.0106.0172000...00000040,2998774864376-3,45226301356237NaNNaN
..................................................................
1533015331153901377Falseinfocasa consulting911360461.0https://www.fotocasa.es/es/comprar/vivienda/ma...2.01.096.0259470...00000040,45416-3,70286NaNNaN
1533115332150394373Falseinmobiliaria pulpon912788039.0https://www.fotocasa.es/es/comprar/vivienda/ma...3.01.0150.0165000...00000040,36652-3,48951NaNNaN
1533215333153901397Falsetecnocasa torrelodones912780348.0https://www.fotocasa.es/es/comprar/vivienda/ma...4.02.0175.0495000...00000040,57444-3,92124NaNNaN
1533315334152607440Falseinmobiliaria pulpon912788039.0https://www.fotocasa.es/es/comprar/vivienda/ma...3.02.0101.0195000...00000040,36967-3,48105NaNNaN
1533415335153901356Falseinfocasa consulting911360461.0https://www.fotocasa.es/es/comprar/vivienda/ma...3.02.0152.0765000...00000040,45773-3,69068NaNNaN
\n", - "

15335 rows × 37 columns

\n", - "
" - ], - "text/plain": [ - " Unnamed: 0 id_realEstates isNew realEstate_name \\\n", - "0 1 153771986 False ferrari 57 inmobiliaria \n", - "1 2 153867863 False tecnocasa fuenlabrada ferrocarril \n", - "2 3 153430440 False look find boadilla \n", - "3 4 152776331 False tecnocasa fuenlabrada ferrocarril \n", - "4 5 153180188 False ferrari 57 inmobiliaria \n", - "... ... ... ... ... \n", - "15330 15331 153901377 False infocasa consulting \n", - "15331 15332 150394373 False inmobiliaria pulpon \n", - "15332 15333 153901397 False tecnocasa torrelodones \n", - "15333 15334 152607440 False inmobiliaria pulpon \n", - "15334 15335 153901356 False infocasa consulting \n", - "\n", - " phone_realEstate url_inmueble \\\n", - "0 912177526.0 https://www.fotocasa.es/es/comprar/vivienda/ma... \n", - "1 916358736.0 https://www.fotocasa.es/es/comprar/vivienda/ma... \n", - "2 916350408.0 https://www.fotocasa.es/es/comprar/vivienda/ma... \n", - "3 916358736.0 https://www.fotocasa.es/es/comprar/vivienda/ma... \n", - "4 912177526.0 https://www.fotocasa.es/es/comprar/vivienda/ma... \n", - "... ... ... \n", - "15330 911360461.0 https://www.fotocasa.es/es/comprar/vivienda/ma... \n", - "15331 912788039.0 https://www.fotocasa.es/es/comprar/vivienda/ma... \n", - "15332 912780348.0 https://www.fotocasa.es/es/comprar/vivienda/ma... \n", - "15333 912788039.0 https://www.fotocasa.es/es/comprar/vivienda/ma... \n", - "15334 911360461.0 https://www.fotocasa.es/es/comprar/vivienda/ma... \n", - "\n", - " rooms bathrooms surface price ... level4Id level5Id level6Id \\\n", - "0 3.0 2.0 103.0 195000 ... 0 0 0 \n", - "1 3.0 1.0 NaN 89000 ... 0 0 0 \n", - "2 2.0 2.0 99.0 390000 ... 0 0 0 \n", - "3 3.0 1.0 86.0 89000 ... 0 0 0 \n", - "4 2.0 2.0 106.0 172000 ... 0 0 0 \n", - "... ... ... ... ... ... ... ... ... \n", - "15330 2.0 1.0 96.0 259470 ... 0 0 0 \n", - "15331 3.0 1.0 150.0 165000 ... 0 0 0 \n", - "15332 4.0 2.0 175.0 495000 ... 0 0 0 \n", - "15333 3.0 2.0 101.0 195000 ... 0 0 0 \n", - "15334 3.0 2.0 152.0 765000 ... 0 0 0 \n", - "\n", - " level7Id level8Id accuracy latitude longitude zipCode \\\n", - "0 0 0 0 40,2948276786438 -3,44402412135624 NaN \n", - "1 0 0 1 40,28674 -3,79351 NaN \n", - "2 0 0 0 40,4115646786438 -3,90662252135624 NaN \n", - "3 0 0 0 40,2853785786438 -3,79508142135624 NaN \n", - "4 0 0 0 40,2998774864376 -3,45226301356237 NaN \n", - "... ... ... ... ... ... ... \n", - "15330 0 0 0 40,45416 -3,70286 NaN \n", - "15331 0 0 0 40,36652 -3,48951 NaN \n", - "15332 0 0 0 40,57444 -3,92124 NaN \n", - "15333 0 0 0 40,36967 -3,48105 NaN \n", - "15334 0 0 0 40,45773 -3,69068 NaN \n", - "\n", - " customZone \n", - "0 NaN \n", - "1 NaN \n", - "2 NaN \n", - "3 NaN \n", - "4 NaN \n", - "... ... \n", - "15330 NaN \n", - "15331 NaN \n", - "15332 NaN \n", - "15333 NaN \n", - "15334 NaN \n", - "\n", - "[15335 rows x 37 columns]" - ] - }, - "execution_count": 1, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "import pandas as pd\n", - "\n", - "# This CSV file contains semicolons instead of comas as separator\n", - "ds = pd.read_csv('assets/real_estate.csv', sep=';')\n", - "ds" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "latin-guest", - "metadata": {}, - "source": [ - "#### Exercise 01. Which is the most expensive house in the dataset? (★☆☆)\n", - "\n", - "Print the address and the price of the selected house. For example:\n", - "\n", - "`The house with address General Street Nº5 is the most expensive and its price is 5000000 USD`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "developing-optimum", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "lesser-cosmetic", - "metadata": {}, - "source": [ - "#### Exercise 02. Which is the cheapest house in the dataset? (★☆☆)\n", - "\n", - "Print the address and the price of the selected house. For example:\n", - "\n", - "`The house with address Concrete Street Nº1 is the cheapest and its price is 12000 USD`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "lovely-oasis", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "compliant-fellowship", - "metadata": {}, - "source": [ - "#### Exercise 03. Which is the biggest and the smallest house in the dataset? (★☆☆)\n", - "\n", - "Print both the address and the surface of the selected houses. For example:\n", - "\n", - "`The biggest house is located on Yukka Street Nº10 and its surface is 5000 meters`\n", - "\n", - "`The smallest house is located on County Road 1 N and its surface is 200 meters`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "every-tiffany", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "danish-spirit", - "metadata": {}, - "source": [ - "#### Exercise 04. How many populations (level5 column) the dataset contains? (★☆☆)\n", - "\n", - "Print the names of the populations with a comma as a separator. For example:\n", - "\n", - "`> print(populations)`\n", - "\n", - "`population1, population2, population3, ...`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "exciting-accreditation", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "crazy-blame", - "metadata": {}, - "source": [ - "#### Exercise 05. Does the dataset contain NAs? (★☆☆)\n", - "\n", - "Print a boolean value (`True` or `False`) followed by the rows/cols that contains NAs." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "transparent-poetry", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "italic-hydrogen", - "metadata": {}, - "source": [ - "#### Exercise 06. Delete the NAs of the dataset, if applicable (★★☆)\n", - "\n", - "Print a comparison between the dimensions of the original DataFrame versus the DataFrame after the deletions." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "administrative-roads", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "middle-china", - "metadata": {}, - "source": [ - "#### Exercise 07. Which is the mean of prices in the population (level5 column) of \"Arroyomolinos (Madrid)\"? (★★☆)\n", - "\n", - "Print the obtained value." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "nuclear-belief", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "concerned-radical", - "metadata": {}, - "source": [ - "#### Exercise 08. Plot the histogram of prices for the population (level5 column) of \"Arroyomolinos (Madrid)\" and explain what you observe (★★☆)\n", - "\n", - "Print the histogram of the prices and write in the Markdown cell a brief analysis about the plot." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "sudden-message", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO: Code" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "impressed-combination", - "metadata": {}, - "source": [ - "**TODO: Markdown**. To write here, double-click on this cell, remove this content and place the text you want to write. Then, execute the cell." - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "actual-edinburgh", - "metadata": {}, - "source": [ - "#### Exercise 09. Are the average prices of \"Valdemorillo\" and \"Galapagar\" the same? (★★☆)\n", - "\n", - "Print both average prices and then write a conclusion about them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "numeric-commerce", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "lonely-article", - "metadata": {}, - "source": [ - "#### Exercise 10. Are the average prices per square meter (price/m2) of \"Valdemorillo\" and \"Galapagar\" the same? (★★☆)\n", - "\n", - "Print both average prices and then write a conclusion about it.\n", - "\n", - "Hint: Create a new column called `pps` (price per square meter) and then analyze the values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "hourly-globe", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "pleasant-invite", - "metadata": {}, - "source": [ - "#### Exercise 11. Analyze the relation between the surface and the price of the houses (★★☆)\n", - "\n", - "Hint: You can make a `scatter plot`, then write a conclusion about it." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "common-drilling", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO: Code" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "ahead-liquid", - "metadata": {}, - "source": [ - "**TODO: Markdown**. To write here, double-click on this cell, remove this content and place the text you want to write. Then, execute the cell." - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "coordinate-sunrise", - "metadata": {}, - "source": [ - "#### Exercise 12. How many real estate agencies does the dataset contain? (★★☆)\n", - "\n", - "Print the obtained value." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "valid-honolulu", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "binding-ebony", - "metadata": {}, - "source": [ - "#### Exercise 13. Which is the population (level5 column) that contains the most houses? (★★☆)\n", - "\n", - "Print both the population and the number of houses." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "static-perry", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "entire-classification", - "metadata": {}, - "source": [ - "#### Exercise 14. Now let's work with the \"south belt\" of Madrid. Make a subset of the original DataFrame that contains the following populations (level5 column): \"Fuenlabrada\", \"Leganés\", \"Getafe\", \"Alcorcón\" (★★☆)\n", - "\n", - "Hint: Filter the original DataFrame using the column `level5` and the function `isin`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "binary-input", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "severe-fisher", - "metadata": {}, - "source": [ - "#### Exercise 15. Make a bar plot of the median of the prices and explain what you observe (you must use the subset obtained in Exercise 14) (★★★)\n", - "\n", - "Print the bar of the median of the prices and write in the Markdown cell a brief analysis about the plot." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "lyric-bunch", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO: Code" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "sublime-newspaper", - "metadata": {}, - "source": [ - "**TODO: Markdown**. To write here, double-click on this cell, remove this content and place the text you want to write. Then, execute the cell." - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "speaking-diamond", - "metadata": {}, - "source": [ - "#### Exercise 16. Calculate the sample mean and variance of the variables: price, rooms, surface area and bathrooms (you must use the subset obtained in Exercise 14) (★★★)\n", - "\n", - "Print both values for each variable." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "random-feeling", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "revolutionary-matrix", - "metadata": {}, - "source": [ - "#### Exercise 17. What is the most expensive house in each population? You must use the subset obtained in Exercise 14 (★★☆)\n", - "\n", - "Print both the address and the price of the selected house of each population. You can print a DataFrame or a single line for each population." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "fifteen-browse", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "activated-knight", - "metadata": {}, - "source": [ - "#### Exercise 18. Normalize the variable of prices for each population and plot the 4 histograms in the same plot (you must use the subset obtained in Exercise 14) (★★★)\n", - "\n", - "For the normalization method, you can use the one you consider; there is not a single correct answer to this question. Print the plot and write in the Markdown cell a brief analysis about the plot.\n", - "\n", - "Hint: You can help yourself by reviewing the *multihist* demo of Matplotlib." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "civic-meditation", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "precise-heavy", - "metadata": {}, - "source": [ - "**TODO: Markdown**. To write here, double-click on this cell, remove this content and place the text you want to write. Then, execute the cell." - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "patent-jonathan", - "metadata": {}, - "source": [ - "#### Exercise 19. What can you say about the price per square meter (price/m2) between the towns of \"Getafe\" and \"Alcorcón\"? You must use the subset obtained in Exercise 14 (★★☆)\n", - "\n", - "Hint: Create a new column called `pps` (price per square meter) and then analyze the values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "initial-liverpool", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "enhanced-moscow", - "metadata": {}, - "source": [ - "#### Exercise 20. Make the same plot for 4 different populations (level5 column) and rearrange them on the same graph. You must use the subset obtained in Exercise 14 (★★☆)\n", - " \n", - "Hint: Make a scatter plot of each population using subplots." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "accepting-airfare", - "metadata": {}, - "outputs": [], - "source": [ - "# TODO" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "blocked-effects", - "metadata": {}, - "source": [ - "#### Exercise 21. Make a plot of the coordinates (latitude and longitude columns) of the south belt of Madrid by color of each population (you must use the subset obtained in Exercise 14) (★★★★)\n", - "\n", - "Execute the following cell, and then start coding in the next one. You must implement a simple code that transforms the coordinates columns in a Python dictionary (add more information if needed) and then add it to the map" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "headed-privacy", - "metadata": {}, - "outputs": [], - "source": [ - "from ipyleaflet import Map, basemaps\n", - "\n", - "# Map centered on (60 degrees latitude and -2.2 degrees longitude)\n", - "# Latitude, longitude\n", - "map = Map(center = (60, -2.2), zoom = 2, min_zoom = 1, max_zoom = 20, \n", - " basemap=basemaps.Stamen.Terrain)\n", - "map" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "present-mistress", - "metadata": {}, - "outputs": [], - "source": [ - "## HERE: plot the coordinates of the estates\n", - "\n", - "## PUT HERE YOUR CODE:\n" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.3" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -}