Sentinel-2 data embedded with Clay v1.5

Sentinel-2 data embedded with Clay v1.5
Product Details
Visibility
Public
Owner
Clay
Created
14 Nov 2024
Last Updated
3 Apr 2025
Product Contents
root
README

This repository contains embeddings of Sentinel-2 on AWS data embedded with Clay v1.5.

Source: This data has been created with comptute support by AWS.

Contact: For feedback and questions, please file a ticket on the model repo of Clay or email us at contact@madewithclay.org

Data Source: Sentinel-2 on AWS:

  • We are doing AoI updates based on user feedback. As of Dec'24 it includes embeddings of Suriname, Brazil, Andhra Pradesh, USA, in a combination always starting in 2024 backwards in time, sometimes up to 2018. We plan to provide comprehensive global coverage in 2025.

Model source: Embeddings generated from inference with Clay v1.5:

  • Embeddings have 1024 dimensions and correspond to the "class" embedding that is used alongside the patch embeddings at the end of the encoder.
  • Each tile is split into tiles of size 256x256, and the attention patch size inside Clay v1.5 is 8x8 px.
  • Inference run was done starting in December '24 on AWS using g4 and g6 EC2, at roughly ~20 embeddings/second, or ~100k embeddings/$ (highly variable).

Embeddings License: Clay CC-By

Format:

  • Folder structure follows the same as Sentinel-2 folder structure.
  • File format is parquet, with two columns: geometry and embeddings

Usage example

1import duckdb
2path = "https://data.source.coop/clay/<PATH>.parquet"
3d = duckdb.read_parquet(path)
4df = d.to_df()
5df.head()
1import duckdb
2path = "https://data.source.coop/clay/<PATH>.parquet"
3d = duckdb.read_parquet(path)
4df = d.to_df()
5df.head()
Source Cooperative is a Radiant Earth project