This tool is solving the task of the extraction of available data for potential sub(species) of interest from GBIF, IUCN and ancillary user-defined sources and enrichment it with spatial raster datasets.
Registration is required to access DOPA REST services to get complete data.
- List of scientific names of potential target sub(species) (another option of species list accessed through command-line is yet to be implemented)
- Mandatory: yes
- Format: CSV/XLSX or command line string
- Ancillary lists for the potential target sub(species) (for example, national or regional Red Lists)
- Mandatory: no
- Format: CSV/XLSX
- Spatial raster dataset, describing (semi-)natural features of the area of interest (for example, land-use/land cover, water index, temperature regime etc.)
- Mandatory: no
- Format: GeoTIFF
- Tabular data with all data available from GBIF, IUCN and ancillary sources
- Mandatory: yes
- Format: CSV
- Occurrence datacube from GBIF fetched for filtered or all species, converted into raster dataset regridded by the input raster file (specified by user and might represent bioclimatic variables of the study area or land-use/land-cover for the further spatial analysis)
- Mandatory: no
- Format: GeoTIFF with at least two bands (input raster dataset and gridded snapshots of species occurrence count)
Workflow is being implemented in a few steps:
-
GBIF-enrichment (MANDATORY)
- GBIF Species API (GET /species/match) to fix the custom list of scientific names of species
- GBIF Species API (GET /species/search) to fetch GBIF unique keys (IDs).
-
IUCN-enrichment (MANDATORY) through DOPA (Digital Observatory on Protected Areas) REST API services as IUCN APIs are currently unavailable to sign up.
- Fetching multiple attributes of species (habitats, threats, stresses, countries, protection categories etc.)
- Concatenation for unique values by IUCN IDs.
-
Mapping between GBIF-enriched and IUCN-enriched datasets by the additional mapping between GBIF and IUCN keys (MANDATORY). Currently completed mapping by scientific names from GBIF and IUCN. It can be also accessed through GUI on Checklistbank portal, but automatic access to this tool is not straightforward and reliable. Complete mapping between unique IDs can be accessed as a static TSV file, but it is not a robust solution as well.
More flexible solution with mapping by IDs should be developed to avoid keeping the mapping database in memory.
-
Species enriched with GBIF and IUCN data can be also enriched with ancillary data from other sources (OPTIONAL). In our case, to detect target species to calculate habitat connectivity in Catalonia, Spain, two ancillary Red Lists have been used
- Enrichment with the Red List of Spain. This Red List has unique IDs of species but they do not match any known IDs in vocabularies from GBIF Backbone Taxonomy. It fetches any mentions of species in the lists of rare, endangered and protected species (Listado de Especies Silvestres en Régimen de Protección Especial (LESRPE) or Categorías en el Catálogo Español de Especies Amenazadas (CEEA)).
- Enrichment with the Red List of Catalonia accessed through Socrata API which must be run with the valid user-authenticated app token. This Red List does not have any unique IDs and consists of five columns, including the scientific name.
-
Enrichment with GBIF datacubes (OPTIONAL). Considering all the data fetched from previous steps, using their knowledge and experience, users should be able to filter out species which are not suitable for their analysis for some reason (for example, users would like to compute habitat connectivity for the patches of decidious forests, while some species do not inhabit them).
- Filtered list of species can be used then to access GBIF occurrence datacubes through the user-authorised download request.
- Downloaded csv file is reprojected, regridded by the input raster dataset and written to the output occurrence raster file (count of occurrence records is written to the new GeoTIFF).
This optional output can be used to conduct comparative analysis between the occurrence of the target species and bio-climatic variables, land-cover types, types of habitats, verify species distribution models etc.
This tool is partly completed, but a few improvements are planned to be done:
Currently, the third step is missing (mapping GBIF tabular data and IUCN tabular data by unique IDs). It is yet to be explored through Checklistbank tools or by scientific or canonical names.Decided to drop the automatic access to Checklistbank tools.Designing an interface to filter species in the tabular output by user depending on their knowledge and experience to access GBIF datacubes later for filtered species only.Decided to use the comprehensive Jupyter Notebook.- Fixing scientific names from ancillary sources with the same GBIF tool.
- Fetching data on habitat suitability and importance from IUCN.
- Cleaning up the code, aligning variables with the configuration file is required (partly completed).
- Test fetching other scopes of IUCN assessment, apart from the Global one, to bring regional protection categories, which are recorded by another ID (for example, Europe and Mediterranean ones for Lynx lynx can be accessed through 1 and 2 URLS with the same species ID, but different scope ID).
- In process: to test another access option of regional dataset from Open data initiative of the Government of Spain API (portal down on 28/08/2024). It might worth switching to this API instead of Socrata API if it doesn't require user authentication.
- Testing IUCN API v4 once it is published and available for sign-up.
- Checklistbank tools do not seem stable enough to support automatic on-fly scraping of matches between GBIF and IUCN keys. Therefore, the static database derived from this tool with mapped IUCN and GBID keys (unique IDs) for threatened species is stored separately for this workflow.
- DOPA REST services are not supporting species whose distribution data is not mapped on IUCN (for example, Emys orbicularis).
- IUCN services do not support fetching data for particular sub-species, therefore only fetching data at species level is available.