Use Wikidata for place names and Wikipedia descriptions

using the above Westpark example, which is entity Q100256456 (https://www.wikidata.org/wiki/Q100256456), i think you would just hit https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q100256456&props=sitelinks&format=json, which returns:

{
    "entities": {
        "Q100256456": {
            "type": "item",
            "id": "Q100256456",
            "sitelinks": {
                "commonswiki": {
                    "site": "commonswiki",
                    "title": "Category:Westpark (Groningen)",
                    "badges": []
                },
                "nlwiki": {
                    "site": "nlwiki",
                    "title": "Westpark (Groningen)",
                    "badges": []
                }
            }
        }
    },
    "success": 1
}

from that, you could get the “title” from “nlwiki” to build a link to https://nl.wikipedia.org/wiki/Westpark_(Groningen). (you’d have to pick the right Wikipedia instance to match the preferred location, if there are articles for multiple Wikipedia instances.)

if you need to look up the Wikidata entity ID based on the iNaturalist place ID (160964), then the query would be something like this:

SELECT ?item ?itemLabel ?iNatPlace {
  ?item wdt:P7471 ?iNatPlace .
  FILTER (?iNatPlace in ("160964"))
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}

see https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3FiNatPlace%20{ %20%20%3Fitem%20wdt%3AP7471%20%3FiNatPlace%20. %20%20FILTER%20(%3FiNatPlace%20in%20("160964")) %20%20SERVICE%20wikibase%3Alabel%20{%20bd%3AserviceParam%20wikibase%3Alanguage%20"[AUTO_LANGUAGE]%2Cen"%20} } for more options on how to connect this to Ruby or other languages. just for ease of reference, this is what that page recommends for Ruby code to display the results of the above sparql query:

#gem install sparql
#http://www.rubydoc.info/github/ruby-rdf/sparql/frames

require 'sparql/client'

endpoint = "https://query.wikidata.org/sparql"
sparql = <<'SPARQL'.chop
SELECT ?item ?itemLabel ?iNatPlace {
  ?item wdt:P7471 ?iNatPlace .
  FILTER (?iNatPlace in ("160964"))
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}

SPARQL

client = SPARQL::Client.new(endpoint,
                            :method => :get,
                            # TODO adjust user agent; see https://w.wiki/CX6
                            headers: {'User-Agent' => 'WDQS-example Ruby'})
rows = client.query(sparql)

puts "Number of rows: #{rows.size}"
for row in rows
  for key,val in row do
    # print "#{key.to_s.ljust(10)}: #{val}\t"
    print "#{key}: #{val}\t"
  end
  print "\n"
end
1 Like

If all you need is to be able to determine the URL for the Wikidata page, this gets you what you need:

SELECT ?item ?taxonname ?inaturalistID WHERE {
?item wdt:P31 wd:Q16521 ;
wdt:P3151 ?inaturalistID ;
wdt:P225 ?taxonname;

  FILTER(xsd:integer(?inaturalistID) =41641)

SERVICE wikibase:label { bd:serviceParam wikibase:language “en”. }
}

The iNat taxon number is the value in the filter statement.

The URL is the q value under item, and then the url can be built up as https://www.wikidata.org/wiki/ThatQValue

I’m sure sparql gurus could find a more efficient way, but this gets it for you.

1 Like

Getting closer. Can you put that in a curl command so I can see the endpoint and method in addition to the request body (which I presume is the sparql query)?

1 Like

Nm, figured it out:

curl -G 'https://query.wikidata.org/sparql' \
  -H "Accept: application/sparql-results+json" \
  --data-urlencode query='
    SELECT ?item ?taxonname ?inaturalistID WHERE {
    ?item wdt:P31 wd:Q16521 ;
    wdt:P3151 ?inaturalistID ;
    wdt:P225 ?taxonname;
    FILTER(xsd:integer(?inaturalistID)=41641)
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
    }
  '

That takes 12s, so there’s no way we’re going to make that request dynamically, for taxa or places, especially if we then have to make additional queries to map a Wikidata identifier like Q36341 to a Wikipedia page URL like https://en.wikipedia.org/wiki/Brown_bear. I guess we could have some process that grinds through the taxa on a regular basis and stores Wikipedia URLs. Can anyone figure a way to get the corresponding Wikipedia URLs for all locales in a single request given a bunch of iNat IDs (for taxa or places)?

1 Like

A couple of things, I’m very much a basic sparql author so it is entirely possible there is a much more efficient way to run it faster. Other sparql fluent folks may be able to help. You can probably delete the wikibase:label service call as you neither need nor care what language the item label is returned in.

In my code library at home I have a query that returns all taxa in Wikidata that have an associated entry for their inat identifier propery. I assume that could be run and results stored periodically?

One thing you may have to account for is it may be possible, in fact I am virtually certain it is possible that multiple Wikidata entities have the same inat ID listed. If you do this a warning flag is added to the entry, but I believe it still physically saves the data.

It should also be possible to get the Wikipedia url directly in the same query, i just did not realize you wanted that. In theory in fact you can get the url for any and all languages that have a Wikipedia page.

1 Like

over in another thread (https://forum.inaturalist.org/t/creating-inaturalist-places-and-linking-to-wikidata-using-geojson/12220/15), @salgo60 talks about a tool that seems to pretty quickly return a Wikipedia URL based on an iNat place ID, if the place is set up with an iNat identifier in Wikidata. he wrote:

Correct would be to ask Wikipedia for the english article corresponding to the iNaturalist place id 152114 and as the Wikidata property is Property:P7471 then this hub tool can help

so then modifying this slightly for taxon, brown bear would be https://hub.toolforge.org/P3151:41641?lang=en

you could check out the tool to see exactly what it’s doing, or maybe it could just be used directly rather than trying to replicate its functionality.

1 Like

One thing likely slowing it down is you might query for any level in the taxonomic tree. If you know for example you only want species you can filter for that which dramatically cuts the query time as it applies a filter. I have saved queries that I know I only want species level for and they run in milliseconds. Not sure if there is a way to replicate that.

1 Like

Wow, a lot happened on this thread overnight! Firstly, you’ve overcomplicated the query by calling for all iNat places and then filtering for the one you want. Instead you can just ask for the one you want. https://w.wiki/gzX

But I actually agree with your conclusion that it is not good to do this on the fly. I suggest you run a query once every day or so to get the whole list, so that as soon as a user goes to any taxon page, the Wikipedia article (in whichever language) is already connected.

It turns out the reason I joined the forum is because I’ve developed a browser extension (called Entity Explosion) which actually does an end-run around websites’ “more info” links and allows users to directly navigate to identical items on other websites, as long as they are linked on Wikidata (which is why I wanted to start linking your place data).

It is pretty new, so only has about 500 users so far, but I’m confident that it plays a role that no other tool yet plays, and works well on iNaturalist (but also about 5000 other sites). I recommend you all try it out: https://www.wikidata.org/wiki/Wikidata:Entity_Explosion

Here are some demonstrations. The first two are homonyms, so will always fail if you are just matching strings. The third is an example of coming to iNat from elsewhere (in this case the Dutch Wikipedia page):



2 Likes

maybe something like this…

places:

SELECT ?iNatPlace ?item ?lang ?sitelink ?name WHERE {
  VALUES ?iNatPlace {"152114" "160964" } .
  ?item wdt:P7471 ?iNatPlace .
  ?sitelink schema:about ?item ;
    schema:inLanguage ?lang ;
    schema:name ?name ;
    schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] .
#  FILTER(?lang in ('en')) . 
}
ORDER BY ?iNatPlace ?item ?lang

taxa:

SELECT ?iNatTaxon ?item ?lang ?sitelink ?name WHERE {
  VALUES ?iNatTaxon {"41641" "3" } .
  ?item wdt:P3151 ?iNatTaxon .
  ?sitelink schema:about ?item ;
    schema:inLanguage ?lang ;
    schema:name ?name ;
    schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] .
#  FILTER(?lang in ('en')) . 
}
ORDER BY ?iNatTaxon ?item ?lang

that said, the hub tool mentioned earlier seemed to do a good / fast job of getting single pages dynamically.

(UPDATE) these could work, too, depending on what you wanted in the results:

places:

SELECT ?iNatPlace ?item ?lang ?sitelink ?name WHERE {
  VALUES ?iNatPlace {"152114" "160964" "12"} .
  OPTIONAL {
    ?item wdt:P7471 ?iNatPlace . 
    OPTIONAL {
      ?sitelink schema:about ?item ;
        schema:inLanguage ?lang ;
        schema:name ?name ;
        schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] .
#      FILTER(?lang in ('en')) .
    }
  }
}
ORDER BY ?iNatPlace ?item ?lang

taxa:

SELECT ?iNatTaxon ?item ?lang ?sitelink ?name WHERE {
  VALUES ?iNatTaxon {"41641" "999999"} .
  OPTIONAL {
    ?item wdt:P3151 ?iNatTaxon . 
    OPTIONAL {
      ?sitelink schema:about ?item ;
        schema:inLanguage ?lang ;
        schema:name ?name ;
        schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] .
#      FILTER(?lang in ('en')) .
    }
  }
}
ORDER BY ?iNatTaxon ?item ?lang
1 Like

Still think a significant performance hit is coming from having to dynamically search for the iNat ID to match it to a WikiData Q item.

I changed it slightly to also show how to get the Wikipedia article, but that slows it even more. The question is do you only want the English article or multiple languages ? I made it return the English and Danish articles but this approach is not scalable nor a viable format.

SELECT DISTINCT ?item ?taxonname ?inaturalistID ?articleEN ?articleDA WHERE {
?item wdt:P31 wd:Q16521;
wdt:P3151 ?inaturalistID;
wdt:P225 ?taxonname;
wdt:P105 wd:Q7432.
?articleEN schema:about ?item;
schema:isPartOf https://en.wikipedia.org/.
?articleDA schema:about ?item;
schema:isPartOf https://da.wikipedia.org/.
FILTER((xsd:integer(?inaturalistID)) = 41641 )
}

A better option is you can list all the Wikipedia articles for a given item in any language via this

If you know the Q item this example lists all Wikipedia articles for all taxa set as children in the hierarchy of genus Ursus (Q243359 is the Wikidata page or genus Ursus):

SELECT ?item ?taxonname ?inaturalistID ?lang ?name ?article WHERE {
?item wdt:P31 wd:Q16521;
wdt:P3151 ?inaturalistID;
wdt:P225 ?taxonname;
wdt:P171* wd:Q243359;
wdt:P105 wd:Q7432.
?article schema:about wd:Q243359;
schema:inLanguage ?lang;
schema:name ?name;
schema:isPartOf [ wikibase:wikiGroup “wikipedia” ] .
}

Trying to do the same query above where you dont know the Q item and dynamically search for one that corresponds to a given iNat ID times out.

Better sparql authors than I may be able to suggest how to optimize the search, but clearly dynamically searching for a given iNat ID is very impactful on search time

1 Like

i think this might get you closer to what you’re describing here:

SELECT ?iNatTaxon ?wdTaxon ?wdTaxonName ?member ?memberName ?memberLabel ?memberRankLabel ?wpLang ?wpArticleLink ?wpArticleName WHERE {
  #1. define a list of iNat taxa IDs that you want to look up in Wikidata
  VALUES ?iNatTaxon {"43328" "41636"} . #horse and bear families
  #2. get the corresponding Wikidata IDs
  ?wdTaxon wdt:P3151 ?iNatTaxon ; 
    wdt:P225 ?wdTaxonName .
  #3. get the members of the WD Taxa. (for example, if you start with a family-level taxon, this will get the family and all the member genera, species, subspecies, etc. that are defined in Wikidata.)
  ?member wdt:P171* ?wdTaxon ; 
    wdt:P105 ?memberRank ;
    wdt:P225 ?memberName .
  #3b. you can also just filter by, say, species
#  FILTER(?memberRank in (wd:Q7432)) .
  #4. if one exists, get a Wikipedia article for each of the members
  OPTIONAL {
    ?wpArticleLink schema:about ?member ; 
      schema:inLanguage ?wpLang ;
      schema:name ?wpArticleName ;
      schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] .
    FILTER(?wpLang in ('en')) . #define which version of the Wikipedia article you want
  }
  #5. define your language preferences for Labels
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]","en" } 
}
ORDER BY ?wdTaxonName ?memberName ?wpLang
2 Likes

Interesting that doing the filter as a Values statement seems faster. There does seem to be a cross-join or something in the returned data, all child species are returning the iNat ID of the parent. So for instance all bears in the list are returning 41636 as the iNat id which is the iNat ID of bears, not the iNat of the species.

1 Like

yes. that “iNatTaxon” is supposed to represent the input taxon ID (of the parent), not the iNat taxon ID for the members. if you want to also see the latter thing, you could do something like this:

SELECT ?iNatTaxon ?wdTaxon ?wdTaxonName ?member ?memberName ?memberLabel ?memberRankLabel ?memberiNatTaxon ?wpLang ?wpArticleLink ?wpArticleName WHERE {
  #1. define a list of iNat taxa IDs that you want to look up in Wikidata
  VALUES ?iNatTaxon {"43328" "41636"} . #horse and bear families
  #2. get the corresponding Wikidata IDs
  ?wdTaxon wdt:P3151 ?iNatTaxon ; 
    wdt:P225 ?wdTaxonName .
  #3. get the members of the WD Taxa. (for example, if you start with a family-level taxon, this will get the family and all the member genera, species, subspecies, etc. that are defined in Wikidata.)
  ?member wdt:P171* ?wdTaxon ; 
    wdt:P105 ?memberRank ;
    wdt:P225 ?memberName .
  #3b. you can also just filter by, say, species
#  FILTER(?memberRank in (wd:Q7432)) .
  #3c. if one exists, get the member's iNat ID from Wikidata
  OPTIONAL { ?member wdt:P3151 ?memberiNatTaxon } .
  #4. if one exists, get a Wikipedia article for each of the members
  OPTIONAL {
    ?wpArticleLink schema:about ?member ; 
      schema:inLanguage ?wpLang ;
      schema:name ?wpArticleName ;
      schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] .
    #4b. define which version of the Wikipedia article you want  
    FILTER(?wpLang in ('en')) .
  }
  #5. define your language preferences for Labels
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]","en" } 
}
ORDER BY ?wdTaxonName ?memberName ?wpLang

yes. i just started using sparql <24 hrs ago, but it seems like filters may be applied at a later time in execution. so you retrieve a bunch of data first, and then you filter it down (inefficient), as opposed to just defining the specific entities you want up front (more efficient).

apparently the third way to limit your set (besides filter and values) is to union a bunch of statements like so:

SELECT ?item WHERE {
  { ?item wdt:P3151 "43328" }
  UNION { ?item wdt:P3151 "41636" } .
}

the union method seems to be fast like the values method, but the union method obviously is more cumbersome to write out.

1 Like

Yeah I dont pretend to understand anything about the internals of the SPARQL engine, but I’m surprised it does not seem to optimize the query into the most efficient code possible regardless of where the qualifiers are (filters, values, where statement etc). All the options generate the same result set, so you’d think they could optimize it.

1 Like

Where should we go to ask for optimalisation ? Is there an email adres, forum or loket somewhere ?
Is Wikimedia Category Eelderbaan connected to Eelderbaan” a “Q100257347” number

Is Wikimedia Category “Roege Bos” related to " Q100257307”

Is Wikimedia Category "Westpark " related to “Q100256456”

on the Wikidata Query Service page (https://query.wikidata.org/), there is a Help menu at the top that gives you an option to provide some feedback. that’s probably where i would start.

1 Like

Just to be clear, I have no idea if the query engine optimizes anything or not. If not, there must be a very good reason why one has not been added after the site has been active for years.

A second option if you have specific needs is the request a query option https://www.wikidata.org/wiki/Wikidata:Request_a_query

If someone does help , likely they are an experienced user with the tool.

1 Like