Skip to content

urban-toolkit/awesome-urban-datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 

Repository files navigation

Awesome Urban Datasets

This is a curated list of publicly available urban datasets, gathered over the years. The datasets are divided by their broad topic (natural phenomena, human-driven phenomena, build environment, others), using the same approach as the one used in our survey of 3D urban analytics. In each topic, you will find datasets with different types (e.g., timeseries, images, audio) and sizes.

In natural phenomena, we list datasets that are primarily describing natural phenomena (e.g., sunlight access / shadow, wind & ventilation, climate). In human-driven phenomena, we list datasets that are primarily describing phenomena driven by human factors (e.g., traffic, energy modeling or energy potential assessment, noise & sound propagation, property cadastre). In built & natural environment, we list datasets that are primarily describing the built and natural environment (e.g., infrastructure, buildings, sidewalks, street networks, trees).

The main goal of this list is to offer a straightforward way to discover urban datasets that can be of interest to users in multiple domains: computer science (e.g., data analytics, data mining, visualization, machine learning, computer vision), social science (e.g., urban planning, economics), climate science, public health, etc.

This list is constantly being updated. Contributions are greatly appreciated.

Static Badge


Table of Contents

  1. Natural phenomena
    1. Sunlight access & shadow
    2. Flooding
  2. Human-driven phenomena
    1. Mobility
    2. Noise & sound propagation
    3. Pollution
    4. Crime
    5. Sanitation
    6. Service requests
  3. Built & natural environment
    1. Roads & sidewalks
    2. Natural environment
    3. Buildings & lots
  4. Others
  5. Links

Natural phenomena

Sunlight access & shadow

  • Shadows Accrual Maps: Detailed shadow information in multiple cities.
  • Deep Umbra: Comprehensive dataset with the sunlight access information for more than 100 cities across six continents of the world.

Flooding

  • European Flood 2013 Dataset: This repository contains metadata and annotations of a flood dataset used in the context of interactive content-based image retrieval.

Human-driven phenomena

Mobility

  • UTD19: Largest multi-city traffic dataset publicly available.
  • NYC Taxi: Trip records from all yellow and green taxis in New York City. It includes detailed information such as pickup and drop-off locations, dates, times, fares, and driver-reported passenger counts.
  • Chicago Taxi: Taxi trips in Chicago from 2013 to the present, including data on trip start and end times, locations, distance traveled, and fare details.
  • NYC Shared Bike: Detailed records of bike-sharing activities in New York City, including information on station locations, trip durations, and user demographics.
  • Washington Shared Bike: Bike-sharing usage in Washington, D.C., featuring data on trip origins and destinations, durations, and bike availability at docking stations.
  • Boston Shared Bike: Detailed records of bike-sharing trips in Boston, including information on start and end times, trip durations, and the geographic distribution of bike docks.
  • Guangzhou, China Urban Traffic Speed: Traffic speed data across the urban areas of Guangzhou, China
  • US Census Commuting Flow: Commuting flow data, showing the movement of workers between home and workplace locations across various geographic levels.
  • Sao Paulo Commuting Flow: Commuting patterns in Sao Paulo, Brazil.
  • COVID19USFlows: Multiscale Dynamic Human Mobility Flow Dataset in the U.S. during the COVID-19 Epidemic.
  • StreetAware: A high-resolution audio, video, and LiDAR dataset of three urban intersections in Brooklyn, New York, totaling approximately 8 unique hours.
  • CCTV Action Recognition Dataset: This action recognition dataset contains short video clips sourced from CCTV footage from existing CCTV datasets as well as YouTube and Google.

Noise & sound propagation

  • UrbanSound: This dataset contains 1302 labeled sound recordings.
  • UrbanSound8K: This dataset contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes.
  • SONYC UST: SONYC Urban Sound Tagging (SONYC-UST), a multilabel dataset from an urban acoustic sensor network.

Pollution

  • Global High Air Pollutants: Global high-resolution (1km, daily) data on air pollution, derived from satellite data, ground data and models.

Crime

  • Crime data in Brazil: This dataset contains structured data about all crime occurrences that have been acted upon by the PM, the main police force in Sao Paulo.

Sanitation

  • NY Monthly Tonnage Data: Monthly collection tonnages that the Department of Sanitation collects from NYC residences and institutions.

Service requests

Built & natural environment

Roads & sidewalks

  • Road networks: List of road networks from several cities.
  • Global Streetscapes: Global Streetscapes is an open dataset made up of 10 million Street View Images (SVIs) spanning 688 cities from 212 countries and regions, crowdsourced from Mapillary and KartaView
  • Project Sidewalk: Point-level information on what accessibility attributes exist and where, and value that indicates how (in)accessible a given street/area is.
  • A century of sprawl in the United States: High-resolution time series of urban sprawl, as measured through street network connectivity, in the United States from 1920 to 2012.

Natural environment

Buildings & lots

  • Global Building Morphology Indicators: GBMI is an open project on systematising, computing, and storing individual and aggregated building form metrics, which may be useful for researchers and practitioners across multiple domains.
  • NYC Pluto: Land use and geographic data at the tax lot level.
  • OpenStreetMap: Crowdsourced data for buildings, roads, etc.
  • CMP Facade: Dataset focused on facade segmentation.
  • Toxic Release Inventory Facilities: The Toxics Release Inventory (TRI) Program tracks the industrial management of toxic chemicals that may cause harm to human health and the environment.

Others

  • Cityscapes: Dataset for semantic urban scene understanding.
  • DroNet: Dataset for drone navigation in urban environments.

Links

This list includes datasets from other compilations, but it specifically focuses on urban data. Other lists include Awesome Public Datasets, Awesome Spatial Data, Awesome Multimodal Urban Computing, Awesome Network Analysis, Free GIS Data.

About

A list of awesome open urban datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published