Show Wikipedia template if no article exists for a given taxon ID and language in Wikidata

thomaseverest · May 25, 2021, 11:44pm

The About section on taxon pages is pulled from Wikipedia through Wikidata, as per this request. This allows the same taxon to display different articles for different languages. However, if no article exists for a given language things can get wonky.

Another article with the same name is shown:
https://www.inaturalist.org/flags/532379
https://www.inaturalist.org/flags/531862
https://www.inaturalist.org/flags/525120
https://www.inaturalist.org/flags/523204
https://www.inaturalist.org/flags/487942

It displays EoL instead (often with issues):
https://www.inaturalist.org/flags/532216
https://www.inaturalist.org/flags/527466
https://www.inaturalist.org/flags/496482
https://www.inaturalist.org/flags/510252

There are currently several ways to address this:

Leave the description correct in some languages but not others.
Turn off the auto-description for all languages.
Create Wikipedia pages whenever there is an issue.

I am proposing that if no Wikipedia article exists for a given language, the usual Wikipedia template should be displayed only for that language. Currently, it is only displayed if there is no Wikipedia page for any language, or if the auto-description is turned off. I have no idea how hard this would be to implement, but Wikidata already shows which iNaturalist taxon identifiers are associated with which language-specific pages. This can also be edited by anyone if there are any issues.

This may be in opposition to this implemented request, but that could be the reason why the wrong pages are showing up.

Another option would be to allow curators to turn off the auto-description for select languages.

cmcheatle · May 25, 2021, 11:57pm

A few questions, in situations like the first case if an unrelated article already exists how will the template even work ? You need to make it a different name.

Not clear what you mean by this. Each taxon on Wikidata can have an inat id, it also lists the different language wikipedia articles. But there should never be different inat identity numbers associated with different language articles.

thomaseverest · May 26, 2021, 1:57am

Take Borkhausenia as an example. If the auto-description is turned on, it links to the moth genus because no Wikipedia page exists (for any language). Ideally, whatever is pulling the Wikipedia article would first search for the iNaturalist taxon ID in Wikidata (965806). Then it would check which languages have corresponding Wikipedia pages (if any):

Because there are none, it would display the template page instead of searching for a different article to display.

Note: as I’m am explaining this @bouteloua seems to be fixing the above flags. So perhaps there is already a way around this and this request isn’t needed. But I would love to know how to fix them. :)

Edit: looks like it was just by creating the necessary Wikipedia pages.

cmcheatle · May 26, 2021, 12:56pm

I understand what they are trying to do, which is account for the fact there are tens, more accurately hundreds of thousands of taxa in iNat which are not mapped to their Wikidata equivalent.

The fallback if you cant find the mapping is a text based search. A small percentage of the time that text search will find an article which shares the same string but is unrelated to the taxon. However from a net number of taxa impacted that will be a far smaller number than the number that wont show an existing unmapped page. that is found by a text search.

I dont see an easy answer other than massive work to align iNat to Wikidata items, and keep the multilingual list of articles on a Wikidata item updated (much of the grunt data management especially intraWiki linkages is done by bots, so this may already be taking place).

thomaseverest · May 26, 2021, 2:22pm

I was under the impression that mapping could be done automatically. I also think the string search is more frequently incorrect than we realize. Overlap with mythology or other Latin terms seems to be decently common and if any language has a duplicate there’s a problem. Perhaps there would be more pages without properly matched articles this way, but adding iNaturalist taxon IDs seems much easier than adding new Wikipedia articles.

cmcheatle · May 26, 2021, 2:26pm

There is an automated tool of some sort called something like Mix’n’Match which apparently does it, but I’m not sure how many people are fluent in its use. I’m certainly not. As far as I know, only one person has ever used it to do the mappings. No idea how much work is involved in doing it, or their receptivity to running it again, which still just gets you back to the baseline of today and then it starts building up with unmapped stuff tomorrow.

cmcheatle · May 26, 2021, 2:41pm

You can pick your favourite taxon and run the following WIkidata query (note it will open in a much more user friendly UI when you click on it). It will show for the selected taxon all the items under its hierarchy (ie if you do a genus it will show all species, subspecies etc) that are entered in Wikidata and if the iNat ID is entered. If not, you can start adding them. Note it has a POWO column as I originally did this for plants, but other identifiers can also be added.

andrawaag · May 26, 2021, 9:10pm

An alternative could also be to add them with Open Refine. It has been done (and in fact is still ongoing) with inaturalist place ids.

Or with a python bot.

I am happy to give it a try. Is there a table that lists the names with their respective inaturalist taxon id? (awesome would be if there are mappings to for example gbif ids, which would really help with the disambiguation. )

cmcheatle · May 26, 2021, 11:47pm

Just as easy to do with QuickStatements too. I just mentioned mixnmatch since that is what I know was used last time this was done.

The gbif mapping in inat is stored in a really obscure place called schemas, I don’t think any Api endpoint hits it. There is some kind of taxonomy export which may have it, I will find it in the morning

cmcheatle · May 27, 2021, 2:10pm

So the export I mentioned which is here
https://www.inaturalist.org/taxa/inaturalist-taxonomy.dwca.zip

Does not have the gbif number in it. But there is enough stuff there to do string matching with an extract of Wikidata.

It’s easy to build and configure a QuickStatememts file to upload the missing ones. I have a test file of 6000+ orchids not mapped in WD that exist on Inat that took about 2 minutes to build. I just can’t get my home internet service to stay up for more than 3 minutes to actually try running it.

I think QuickStatements is throttled to 20000 records per load and 1 operation per second without looking it up, don’t know if any of the other options can load faster.

cmcheatle · May 27, 2021, 5:12pm

So it is easy to do in QuickStatements, it takes about 2-3 minutes to prepare a file and then run.

I did 6800 orchids earlier this morning, currently have 10000 spiders running thru.

The only rate limiting factor is how fast QuickStatements allows records to process.

I will gradually keep updating groups as I can. I’ve added a journal post to my profile where users can see the progress.

If people want to request specific taxonomic areas be done, just leave a comment there. I will likely run most of these overnights Toronto time as running it seems to block your Wikidata account from doing any other editing (note I may try with smaller batches to see if the lock remains)

https://www.inaturalist.org/journal/cmcheatle/52687-wikidata-inat-id-integration-updates

Update - almost 90000 taxa updated in Wikidata with the corresponding iNat ID to support lookups. Continuing daily. Now I guess others have to step up and start creating those missing Wikipedia articles

Topic		Replies	Views
Use Wikidata to link to appropriate Wikipedia articles in all languages Feature Requests wikipedia	28	5188	November 15, 2020
Use English Wikipedia as taxa information fallback if no Wikipedia entry in my (non-English) language exists Feature Requests wikipedia	10	1192	November 18, 2020
Fall back on searching Wikipedia directly if no article found via Wikidata Feature Requests web , wikipedia	9	1236	February 15, 2021
Wikidata/Wikipedia link for taxon page About tab no longer working Bug Reports web	6	897	January 3, 2024
Ways to help improve iNaturalist taxon pages through Wikipedia General	77	14037	December 12, 2024

Show Wikipedia template if no article exists for a given taxon ID and language in Wikidata

Related topics