iNaturalist API GBIF Taxon iD

Hello everyone,

I manage a database in PostGreSQL, and I retrieve my observations via iNaturalist API and a Python script inspired by the remarkable work of @pisum here: https://jumear.github.io/stirpy/lab?path=iNat_APIv1_get_observations.ipynb
(thanks again :face_blowing_a_kiss:)

In order to link to local taxonomic repositories (INPN Taxref in my case), I need to retrieve the link between the taxon_id from iNaturalist and gbif_taxon_id.

I can do this using this script, which uses Wikidata and SPARQL :

import requests
import psycopg2
from psycopg2 import sql

DB_CONFIG = {
    "dbname": "inaturalist",
    "user": "user",
    "host": "host",
    "port": "5432"
}

SPARQL_ENDPOINT = "https://query.wikidata.org/sparql"

def get_gbif_id_from_inat_id(inat_id):
    query = f"""
    SELECT ?gbif_id WHERE {{
        ?item wdt:P3151 "{inat_id}".
        OPTIONAL {{ ?item wdt:P846 ?gbif_id. }}
    }}
    """
    headers = {
        "User-Agent": "iNaturalist-GBIF-Sync/1.0 (mail@mail.fr)"
    }
    try:
        response = requests.get(
            SPARQL_ENDPOINT,
            params={"query": query, "format": "json"},
            headers=headers
        )
        response.raise_for_status()
        data = response.json()
        results = data.get("results", {}).get("bindings", [])
        if results:
            return results[0].get("gbif_id", {}).get("value")
    except requests.exceptions.RequestException as e:
        print(f"Error for {inat_id}: {e}")
    return None

def update_gbif_ids_in_database():
    """Update the database table with GBIF IDs."""
    conn = psycopg2.connect(**DB_CONFIG)
    cursor = conn.cursor()
    cursor.execute("SELECT taxon_id, taxon_name FROM taxo.inat_taxo WHERE gbif_taxon_id IS NULL or gbif_taxon_id = 0")
    taxons = cursor.fetchall()
    for taxon_id, taxon_name in taxons:
        gbif_id = get_gbif_id_from_inat_id(str(taxon_id))
        if gbif_id:
            cursor.execute(
                sql.SQL("""
                    UPDATE taxo.inat_taxo
                    SET gbif_taxon_id = %s
                    WHERE taxon_id = %s
                """),
                (gbif_id, taxon_id)
            )
            print(f"Updated: {taxon_name} (iNaturalist: {taxon_id}) → GBIF: {gbif_id}")
        else:
            print(f"No GBIF ID found for: {taxon_name} (iNaturalist: {taxon_id})")
    conn.commit()
    cursor.close()
    conn.close()

if __name__ == "__main__":
    update_gbif_ids_in_database()

However, I think this is resource-intensive for Wikidata, and I wonder if the global correspondence table might be available somewhere.

Do you have any advice?

3 Likes

There is no official iNat/GBIF ID correspondence table, at least partly because it’s not a very stable way to connect the two databases (ID numbers change fairly frequently). GBIF just applies a name matching function to iNat data as it gets ingested.

1 Like

Oh, fun.

This should not be a particularly heavy query on Wikidata, by the way. Other options are:

1 — You can get the full mapping table at once from faster RDF endpoints, e.g. Wikidata on QLever.

https://qlever.cs.uni-freiburg.de/wikidata/DwDqoH

821k results; than you can load it as a dict or a mapping table somewhere

2 — Wikidata has been exploring a GraphQL service (https://www.wikidata.org/wiki/Wikidata:Wikibase_GraphQL_prototype) that could eventually be plugged in place of this function you have there

If you need other ids, it is always possible to add them to Wikidata. Let me know if you need help with that!

1 Like

if you’re going to get a selection of IDs via Wikidata, it would probably be more efficient – at least fewer requests – if you got the data in batches rather than one at a time. i don’t know exactly how many records Wikidata will return per request, but you can make your batches as big as that limit.

for example, this query gets 3 IDs in one query:

 SELECT ?inat_taxon_id ?item ?gbif_taxon_id WHERE {
    VALUES ?inat_taxon_id {"62741" "200073" "41641"} .
    OPTIONAL { ?item wdt:P3151 ?inat_taxon_id } .
    OPTIONAL { ?item wdt:P846 ?gbif_taxon_id }.
 }

2 Likes

Thank you all for your responses! (And sorry, I wasn’t available this week to thank you sooner.)

I went with the option suggested by @tiagolubiana , and created a GBIF/iNat correspondence table with this query:

PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX schema: <http://schema.org/>
PREFIX wd: <http://www.wikidata.org/entity/>

SELECT DISTINCT ?item ?gbif_id ?inat_id ?en_wiki ?fr_wiki WHERE {
    ?item wdt:P3151 ?inat_id .
    ?item wdt:P846 ?gbif_id .

    OPTIONAL {
        ?en_article schema:about ?item ;
                   schema:inLanguage "en" ;
                   schema:isPartOf <https://en.wikipedia.org/> .
        BIND(URI(CONCAT("https://en.wikipedia.org/wiki/", ENCODE_FOR_URI(STRAFTER(STR(?en_article), "https://en.wikipedia.org/wiki/")))) AS ?en_wiki)
    }

    OPTIONAL {
        ?fr_article schema:about ?item ;
                   schema:inLanguage "fr" ;
                   schema:isPartOf <https://fr.wikipedia.org/> .
        BIND(URI(CONCAT("https://fr.wikipedia.org/wiki/", ENCODE_FOR_URI(STRAFTER(STR(?fr_article), "https://fr.wikipedia.org/wiki/")))) AS ?fr_wiki)
    }
}

direct link : https://qlever.cs.uni-freiburg.de/wikidata/KioK1I

(I added the Wikipedia URLs, which will also be useful for me)

821237 lines found :grinning_face:

And I’ve already corrected an incorrect match!
This will also help me contribute to adding the missing matches!

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.