Use Wikidata for place names and Wikipedia descriptions

Hi folks, just going to reply to @tobyyy here. We don’t have a good way to get all the iNat place IDs using the API, but we could publish a regularly updated archive of this data. We could just dump relevant data out of the database into a CSV file sans boundaries (~118,700 lines) if that would work. Or, if there’s some kind of standard format for publishing place data like this akin to DarwinCore, we could publish something like that. What would you prefer?

5 Likes

@kueda the CSV would be awesome. There’s no need for boundaries (for my purposes) since with the ID I can point to your place page to enable matching.

@lawnranger There are way more than 50,000 places on Wikidata. I expect that over half of the iNat ones have a direct match. Having matched a number of sets with tens of thousands of entries, I’m confident that it’s best to do the matching all in one place.

3 Likes

For comparison, the taxon matching has nearly reconciled 600k items. That task was easier because of systematic nomenclature, but gives a good sense of what can be done: https://mix-n-match.toolforge.org/#/catalog/238

3 Likes

Three days ago I created redirect pages on Wikipedia. Is this the wrong way?

https://www.inaturalist.org/places/nieuwegein-park-oudegein#abouttab
https://www.inaturalist.org/places/nieuwegein-ijsselbos#abouttab

1 Like

I want to connect

https://nl.wikipedia.org/wiki/Westpark_(Groningen)
About https://www.inaturalist.org/places/groningen-deheld-westpark (tab blad)
https://commons.wikimedia.org/wiki/Category:Westpark_(Groningen) Q100256456

https://nl.wikipedia.org/wiki/Eelderbaan
https://www.inaturalist.org/places/groningen-vinkhuizen-eelderbaan
https://commons.wikimedia.org/wiki/Category:Eelderbaan Q100257347

https://nl.wikipedia.org/wiki/Roege_Bos Q100257307
About https://www.inaturalist.org/places/groningen-deheld-roege-bos-eelderbaan (tabblad)
https://commons.wikimedia.org/wiki/Category:Roege_Bos Q100257307

Let me know if this works for you: http://www.inaturalist.org/places/inaturalist-places.csv.zip

Some of the columns need some explaining, so maybe this needs a metadata file in the zip, but for now it should provide you with a list of iNat place IDs. You can use the slug column to link to the place page on iNat, e.g. https://www.inaturalist.org/places/lonsdale-camp-bush

3 Likes

@ahospers here’s how to do it on Wikidata: https://www.wikidata.org/w/index.php?title=Q100256456&type=revision&diff=1291798322&oldid=1289730848 . That’s what I’m trying to do in bulk. Whether iNat can draw down that info yet for the “Wikipedia description” panel is the next step for others to say.

2 Likes

@ahospers actually it’s even better to use the id: https://www.wikidata.org/w/index.php?title=Q100256456&type=revision&diff=1291800427&oldid=1291798322

1 Like

@kueda Brilliant. The property was set up to use the IDs rather than the slugs, but I see that both work fine. I’ve now set up the Mix’n’Match set: https://mix-n-match.toolforge.org/#/catalog/3900 . In case anyone wants to help out, you’ll need a wiki account first.

3 Likes

Thank you…i will read and wait till ti happens automatically i hope. I have been there and did not understood wahat to do
What is a “Slug” ???
And “Mix’n’Match " ?
I do not know anything but i thought
Give “Eelderbaan” a “Q100257347” number (works for all languages i hope, use in WikiMedia??)
Give “Roege Bos” a " Q100257307)” number (works for all languages i hope)
"Westpark " has “Q100256456” number (works for all languages i hope, use in WikiMedia??)

It seems you know what to do. I know I need a number like Q100256456

https://www.wikidata.org/w/index.php?title=Q100256456&type=revision&diff=1291798322&oldid=1289730848 I need 3 red dots in stead of one image

SLUG: https://forum.inaturalist.org/t/creating-inaturalist-places-and-linking-to-wikidata-using-geojson/12220/17?u=ahospers It might be an idea to use WikiMedia photos of places in stead of Flickr files as Flicker is less popular nowadays i think

Now i am lost. I thought i had to add “Q100256456” to the photos to make them visible in any language. But know it seems there is a category
https://commons.wikimedia.org/wiki/Category:Westpark_(Groningen) Q100256456
without “Q100256456”
How does this work in multiplie languages, espacially Dutch and English (and German)

Can somebody tell me how this works ?
I did not know the parks itself where a seperate category (one week ago) ??

https://commons.wikimedia.org/wiki/Category:Eelderbaan Q100257347
https://commons.wikimedia.org/wiki/Category:Roege_Bos Q100257307

SLUG: https://forum.inaturalist.org/t/creating-inaturalist-places-and-linking-to-wikidata-using-geojson/
It might be an idea to use WikiMedia photos of places in stead of Flickr files as Flicker is less popular nowadays i think

@ahospers Wikidata currently doesn’t affect the Wikipedia descriptions for places in iNaturalist, as far as i know. there’s been some talk here and in other discussions about using Wikidata as way to link iNaturalist places, but i’m not aware of any serious steps toward thinking through pros and cons and what that would actually look like if implemented.

this is currently the right way to do it in iNaturalist, along with the option of changing the place name in iNaturalist to match an existing entry in Wikipedia.

it should be noted however, that there could be a lag of up to 30 days for changes in Wikipedia to be reflected in iNaturalist (see discussion starting from https://forum.inaturalist.org/t/creating-inaturalist-places-and-linking-to-wikidata-using-geojson/12220/16), although this may or may not be improved after this: https://forum.inaturalist.org/t/adding-wikipedia-links-when-an-article-is-created-about-a-taxon/9084/12.

1 Like

FWIW, one impediment is that I personally find the Wikidata SPARQL API pretty mystifying. If someone could provide a working example of how one uses it to, say, retrieve a Wikipedia page URL for a taxon given an iNat taxon ID (or s/taxon/place/ to keep it on-topic), that would help me out a lot. @tobyyy, maybe you could provide an example?

1 Like

using the above Westpark example, which is entity Q100256456 (https://www.wikidata.org/wiki/Q100256456), i think you would just hit https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q100256456&props=sitelinks&format=json, which returns:

{
    "entities": {
        "Q100256456": {
            "type": "item",
            "id": "Q100256456",
            "sitelinks": {
                "commonswiki": {
                    "site": "commonswiki",
                    "title": "Category:Westpark (Groningen)",
                    "badges": []
                },
                "nlwiki": {
                    "site": "nlwiki",
                    "title": "Westpark (Groningen)",
                    "badges": []
                }
            }
        }
    },
    "success": 1
}

from that, you could get the “title” from “nlwiki” to build a link to https://nl.wikipedia.org/wiki/Westpark_(Groningen). (you’d have to pick the right Wikipedia instance to match the preferred location, if there are articles for multiple Wikipedia instances.)

if you need to look up the Wikidata entity ID based on the iNaturalist place ID (160964), then the query would be something like this:

SELECT ?item ?itemLabel ?iNatPlace {
  ?item wdt:P7471 ?iNatPlace .
  FILTER (?iNatPlace in ("160964"))
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}

see https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3FiNatPlace%20{ %20%20%3Fitem%20wdt%3AP7471%20%3FiNatPlace%20. %20%20FILTER%20(%3FiNatPlace%20in%20("160964")) %20%20SERVICE%20wikibase%3Alabel%20{%20bd%3AserviceParam%20wikibase%3Alanguage%20"[AUTO_LANGUAGE]%2Cen"%20} } for more options on how to connect this to Ruby or other languages. just for ease of reference, this is what that page recommends for Ruby code to display the results of the above sparql query:

#gem install sparql
#http://www.rubydoc.info/github/ruby-rdf/sparql/frames

require 'sparql/client'

endpoint = "https://query.wikidata.org/sparql"
sparql = <<'SPARQL'.chop
SELECT ?item ?itemLabel ?iNatPlace {
  ?item wdt:P7471 ?iNatPlace .
  FILTER (?iNatPlace in ("160964"))
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}

SPARQL

client = SPARQL::Client.new(endpoint,
                            :method => :get,
                            # TODO adjust user agent; see https://w.wiki/CX6
                            headers: {'User-Agent' => 'WDQS-example Ruby'})
rows = client.query(sparql)

puts "Number of rows: #{rows.size}"
for row in rows
  for key,val in row do
    # print "#{key.to_s.ljust(10)}: #{val}\t"
    print "#{key}: #{val}\t"
  end
  print "\n"
end
2 Likes

If all you need is to be able to determine the URL for the Wikidata page, this gets you what you need:

SELECT ?item ?taxonname ?inaturalistID WHERE {
?item wdt:P31 wd:Q16521 ;
wdt:P3151 ?inaturalistID ;
wdt:P225 ?taxonname;

  FILTER(xsd:integer(?inaturalistID) =41641)

SERVICE wikibase:label { bd:serviceParam wikibase:language “en”. }
}

The iNat taxon number is the value in the filter statement.

The URL is the q value under item, and then the url can be built up as https://www.wikidata.org/wiki/ThatQValue

I’m sure sparql gurus could find a more efficient way, but this gets it for you.

1 Like

Getting closer. Can you put that in a curl command so I can see the endpoint and method in addition to the request body (which I presume is the sparql query)?

1 Like

Nm, figured it out:

curl -G 'https://query.wikidata.org/sparql' \
  -H "Accept: application/sparql-results+json" \
  --data-urlencode query='
    SELECT ?item ?taxonname ?inaturalistID WHERE {
    ?item wdt:P31 wd:Q16521 ;
    wdt:P3151 ?inaturalistID ;
    wdt:P225 ?taxonname;
    FILTER(xsd:integer(?inaturalistID)=41641)
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
    }
  '

That takes 12s, so there’s no way we’re going to make that request dynamically, for taxa or places, especially if we then have to make additional queries to map a Wikidata identifier like Q36341 to a Wikipedia page URL like https://en.wikipedia.org/wiki/Brown_bear. I guess we could have some process that grinds through the taxa on a regular basis and stores Wikipedia URLs. Can anyone figure a way to get the corresponding Wikipedia URLs for all locales in a single request given a bunch of iNat IDs (for taxa or places)?

1 Like

A couple of things, I’m very much a basic sparql author so it is entirely possible there is a much more efficient way to run it faster. Other sparql fluent folks may be able to help. You can probably delete the wikibase:label service call as you neither need nor care what language the item label is returned in.

In my code library at home I have a query that returns all taxa in Wikidata that have an associated entry for their inat identifier propery. I assume that could be run and results stored periodically?

One thing you may have to account for is it may be possible, in fact I am virtually certain it is possible that multiple Wikidata entities have the same inat ID listed. If you do this a warning flag is added to the entry, but I believe it still physically saves the data.

It should also be possible to get the Wikipedia url directly in the same query, i just did not realize you wanted that. In theory in fact you can get the url for any and all languages that have a Wikipedia page.

1 Like

over in another thread (https://forum.inaturalist.org/t/creating-inaturalist-places-and-linking-to-wikidata-using-geojson/12220/15), @salgo60 talks about a tool that seems to pretty quickly return a Wikipedia URL based on an iNat place ID, if the place is set up with an iNat identifier in Wikidata. he wrote:

Correct would be to ask Wikipedia for the english article corresponding to the iNaturalist place id 152114 and as the Wikidata property is Property:P7471 then this hub tool can help

so then modifying this slightly for taxon, brown bear would be https://hub.toolforge.org/P3151:41641?lang=en

you could check out the tool to see exactly what it’s doing, or maybe it could just be used directly rather than trying to replicate its functionality.

2 Likes

One thing likely slowing it down is you might query for any level in the taxonomic tree. If you know for example you only want species you can filter for that which dramatically cuts the query time as it applies a filter. I have saved queries that I know I only want species level for and they run in milliseconds. Not sure if there is a way to replicate that.

1 Like