Google-Microsoft Open Buildings - combined by VIDA

This dataset merges Google's V3 Open Buildings and Microsoft's latest Building Footprints. It contains 2,579,035,323 footprints and is divided into 185 partitions. Each footprint is labelled with its respective source, either Google or Microsoft. It can be accessed in cloud-native geospatial formats such as GeoParquet, FlatGeobuf and PMTiles.
Product Details
Visibility
Public
Owner
VIDA
Created
15 Sep 2023
Last Updated
5 Aug 2025
README

Google-Microsoft Open Buildings - combined by VIDA

Overview

This dataset merges Google's V3 Open Buildings and Microsoft's latest Building Footprints. It contains 2,579,035,323 footprints and is divided into 185 partitions. Each footprint is labelled with its respective source, either Google or Microsoft. It can be accessed in cloud-native geospatial formats such as GeoParquet, FlatGeobuf and PMTiles.

See it in action

You can Observable to get a quick overview of the dataset or go to VIDA to see it in action.

Original datasets

Google

The original Google V3 open buildings is downloadable from this link as gzipped CSV files. Here are some key details about the original dataset:

The dataset contains 1.8 billion building detections, across an inference area of 58M km2 within Africa, South Asia, South-East Asia, Latin America and the Caribbean.

Each building in the dataset has a polygon defining its footprint on the ground, a confidence score indicating how certain we are that this is a building, and a Plus Code corresponding to the centre of the building. There is no information about the type of building, its street address, or any details other than its geometry.

For more comprehensive information, please visit the description page. You can also check out the FAQ section for additional information.

Microsoft

The latest version of Microsoft's building footprints can be downloaded from Microsoft Planetary Computer as gzipped partitioned files.

The Microsoft Global Open Buildings dataset was generated through Bing Maps, which detected a total of 1.24 billion buildings. These buildings were identified using imagery from Bing Maps, encompassing data collected between 2014 and 2023, including images from Maxar, Airbus, and IGN France.

For more detailed information please visit the github page

Data Formats

The data is available in the following formats:

  • GeoParquet 1.1.0
    • By country - single file
    • By country - S2 partitioned
  • FlatGeobuf
    • By country - single file
    • By country - S2 partitioned
  • PMTiles
    • Global - single layer
    • Global - layer per country based on the 3-letter ISO code
    • By country

Partitioning

This extensive dataset is organized into 185 root partitions. Each partition typically corresponds to a country's administrative boundary, as defined by the Comprehensive Global Administrative Zones (CGAZ) at the ADM0 level, which can be accessed here. There is also a sub-partition available, based on the S2 grid.

By country

Both FlatGeobuf and GeoParquet are categorized by country boundaries, in accordance with the ADM0 level of the CGAZ geoboundary definition. This means that building footprints are separated by countries within each format. For naming conventions, we utilize the country's ISO CODE.

/geoparquet/by_country/country_iso={ISO}/{ISO}.parquet

Note: There is a partition labeled country_iso=None, which represents a MULTIPOLYGON containing geoboundaries (POLYGONS) that have not been explicitly defined or named by CGAZ. These geoboundaries are still captured by CGAZ at the ADM0 level, but they lack specific names and therefore labelled null. As a result, building footprints located within these geoboundaries are included in this partition labeled country_iso=None. For instance, the area between Sudan and South Sudan includes a piece of land known as "Abyei" which remains unclaimed due to recurring conflicts, and therefore, it lacks an assigned name.

By country + S2 grid

To enhance performance, particularly with GeoParquet files, we've introduced an S2 sub-partitioning strategy. Each ISO partition is further divided using an S2 grid ID, ensuring a cap of 20 million building footprints per grid ID. This S2 grid partitioning is exclusive to GeoParquet files.

/geoparquet/by_country_s2/country_iso={ISO}/{S2_GRID_ID}.parquet

Schema

Each row in the dataset provides information on a specific building footprint with associated information on individual columns:

  • boundary_id (INTEGER): A unique ID linking the CGAZ level 0 boundary ISO to an integer, created for partitioning the datasets within BigQuery.
  • confidence (FLOAT): A metric denoting the model's confidence about the accuracy of the building footprint. Microsoft-sourced footprints set this column to null since the original dataset doesn't feature this attribute.
  • bf_source (STRING): Indicates the footprint's origin - Google or Microsoft.
  • area_in_meters (FLOAT): Represents the polygon's area in square meters.
  • s2_id (INT): Exclusive to the S2 partitioning scheme, it represents the S2 grid ID.
  • country_iso (STRING): 3-letter ISO code of the country the footprint belongs within.
  • geohash (STRING): Geohash for the geometry at a precision level of 8.
  • bbox (STRUCT): Struct containing xmin, ymin, xmax, ymax values for the bounding box of the geometry.

Data Processing

We invite you to read our blog post for more detailed information on our dataset merging approach, which includes insights into the optimization techniques we investigated and the query performance on BigQuery. In this section, we provide a high-level summary of the merging process, highlighting its crucial aspects.
We imported both datasets into BigQuery for further processing. From the Google dataset, we excluded columns like full_plus_code, latitude, and longitude. For the Microsoft dataset we did not drop any columns. We then matched each building footprint with a boundary ID, determined by the intersection of its centroid with the country geoboundaries in the CGAZ ADM0 dataset. Footprints whose centroids didn't overlap with any country geoboundary were mapped to the nearest geoboundary based on their centroid's position.

Contact details

If you'd like more information about the dataset or the processing steps, feel free to write an email to maarten@vida.place.

Changelog

Current version: 2.0

Version 2.0 - 2024-09-04

  • Add 32,784,238 building footprints for various regions by updating to the latest Microsoft GlobalMLBuildingFootprints as of 2024-05-28.
  • Update to GeoParquet schema version 1.1.0.
    • Includes bbox struct for easy filtering.
  • Introduce spatial ordering by geohash for FlatGeobuf and GeoParquet files.
  • Add PMTiles files per country.
  • Add PMTiles file with a layer per country ISO code.

Version 1.1 - 2023-10-02

  • Added 11,631,283 building footprints for Morroco from the Google Earthquake dataset
  • Added 24,532 building footprints for Libya from the Google Derna Flooding dataset
  • Building footprints are added to the GeoParquet, FlatGeobuf and PMTiles archives.
  • Fixed the missing GeoParquet version bug.
  • Refactored S2 grid naming strategy from unsigned 64bit integers to signed 64bit integers.

Version 1 - 2023-08-29

Dataset Licenses

The data is shared under the Creative Commons Attribution (CC BY-4.0) license and the Open Data Commons Open Database License (ODbL) v1.0 license. As the user, you can pick which of the two licenses you prefer and use the data under the terms of that license.

Source Cooperative is a Radiant Earth project