Agri-environmental Semantic Segmentation of LUCAS landscape photos
This Dataset is a collection of street-level images extracted from the Land Use/Cover Area Frame Survey (LUCAS) dataset provided by Eurostat. This dataset is designed for semantic segmentation tasks, focusing on distinguishing between different land cover categories, including agricultural and natural landscapes. The dataset covers the survey year 2018 and includes north looking images with both full masks and partial masks, where certain areas are not delineated.This dataset contains a semantic segmentation delineation derived from street-level images, focusing on categorizing agricultural and natural landscapes. With 35 distinct classes, including labels such as "field margin," "crop," "cropfield," and "ditch," the dataset draws from Land Use/Cover Area Frame Survey (LUCAS) geospatial dataset. LUCAS images are collected using a consistent sampling framework, offering a representative view of different regions and environments of Europe.
Comprising a total of 1784 north looking images from 2018, this dataset contributes to land cover analysis by providing fine-grained annotations for a variety of landscape elements, as well as, a valuable resource for training and evaluating semantic segmentation models.
The dataset's potential applications span a range of domains, from land use mapping and environmental monitoring to urban planning and agricultural management. By fostering the advancement of machine learning models in accurately segmenting landscapes, this dataset contributes to sustainable land management practices and supports informed decision-making processes.
Dataset Structure
We provide two data products across three folders derived from the same raw data, for a total of three folders reported in this repository:
- raw_data
- ml_data
- STAC
Raw data
Across the above folders the raw data is the original data and not easily useable in machine learning context, but kept as a reference. The original dataset is organised into batches, per segmentation campaign, with each batch containing three main folders:
images
: Contains the LUCAS north-looking images captured for each theoretical point.full_masks
: Contains pixel-level annotated masks corresponding to each image, where each pixel is labelled with a class.partial_masks
(only for the first batch): Contains partial masks where some areas of the images are not delineated.
In the root of the raw_data folder is a classes_dataset.csv
csv file containing the code and label correspondence.
ML data
The batch data is consolidated and enhanced the original labelled data with geolocation information and ancillary data derived from the Harmonized LUCAS in-situe land-cover and land use database. This meta-data can provide the necessary context within machine learning exercises or exploratory analysis.
The data is structured in two folders:
- images
- masks
With in the root of the folder a file with the meta-data called lucas_ml_data.csv
with ancillary data. It also contains the classes_dataset.csv
CSV file containing the code and label correspondence.
STAC
The dynamic use of the data without downloading all data, should the dataset grow, can be accomplished using the implementation of a Spatio-Temporal Assets Catalogue (STAC). The STAC format allows for easy spatio-temporal subsetting. The data can be visually browsed using the STAC browser.
Data Format
- Image files are provided in JPEG format.
- Masks are provided in PNG format, where each pixel corresponds to a specific class
Data size
- Each data folder is approximately ~1GB in size with the STAC being largely larger
- Exact total 3.3 GB
Usage
This dataset can be used for various semantic segmentation tasks, including land cover analysis, environmental monitoring, and urban planning. The unique identifiers in the image names enable geospatial analysis using correspondence with the LUCAS harmonised database. The provided ML dataset provides this capability.
Citation
If you use this dataset in your work, please consider citing the following paper:
Andrimont, Raphaël d’, Momchil Yordanov, Laura Martinez-Sanchez, Beatrice Eiselt, Alessandra Palmieri, Paolo Dominici, Javier Gallego, et al. “Harmonised LUCAS In-Situ Land Cover and Use Database for Field Surveys from 2006 to 2018 in the European Union.” Scientific Data 7, no. 1 (December 2020): 352. [https://doi.org/10.1038/s41597-020-00675-z](https://doi.org/10.1038/s41597-020-00675-z](https://doi.org/10.1038/s41597-020-00675-z))
License
The LUCAS Semantic Segmentation Dataset is provided under CDLA-Permissive-1.0 License
Authors
- Marijn Van der Velde
- Laura Martinez-Sanchez
- Raphaël d’Andrimont
- Elizabeth Kearsley
- Koen Hufkens
Release
- version: v1.0
- Latest release: 13 september 2023
- previous release: 13 september 2023
- Temporal coverage: 2018
- Update frequency: 3-yearly for the underlying data
- Spatial coverage (geographic area): European Union
- Spatial coverage (bounding box): [xmin = -9.56, ymin = 34.7, xmax = 33.4, ymax = 65.8]