Can I bulk-import translations of common names of taxa?

I noticed recently that a lot of taxa has missing names in Hungarian and Serbian.

Even well-known, high level taxa, like the Legumes family, does not have these translations, while the corresponding Wikidata entry Q44448 has these (and more) languages in the translations section on top. Additionally, the Wikispecies entry Fabaceae has a section vernacular names, which contains some additional translations, and could likely be used as a source of truth here.

So my question is, would it make sense to bulk-import translations from Wikidata? If so, it would seem like that would greatly increase the translation coverage for some languages.

If not, then I’d still be interested in seeing the translations for the species I care most about, but I’m fine adding those “manually” (and by manually, I mean I’ll scrape Wikidata into a spreadsheet, review the results for obvious mistakes, and then upload them via the website, with a comment stating that the results are coming from Wikidata).

Also note that Wikidata too has some missing translations, but I’m also working on closing some of the gaps there.

1 Like

Can you change the name of the topic to make clear you are talking about common names of taxa?
I expected translation of the website…

https://forum.inaturalist.org/t/bulk-adding-of-common-names/23001/16 ( with template )

Probably more topics about it and also a feature request:
https://forum.inaturalist.org/t/create-a-system-for-importing-common-names/3513

3 Likes

I am wondering how to keep the common names updated in time. If a common name changes, e.g. because of a taxon split https://www.inaturalist.org/taxon_changes?taxon_id=144242 , how do you keep the common names updated? Also a tool to compare iNaturalist with Wikidata could be helpfull. Maybe as source of taxon names of iNaturalist taxonomy DWCA export
(https://forum.inaturalist.org/t/using-sql-to-query-inats-dwca-taxonomy-export/29377/17) could be helpfull.

1 Like

Also a tool to compare iNaturalist with Wikidata could be helpfull.

Oh, that’s a good idea! I can look into something like that.


On that note, how often is inaturalist-taxonomy.dwca.zip updated?

EDIT: I just found on https://www.inaturalist.org/observations/export, that it is updated weekly.

1 Like

I am wondering how to keep the common names updated in time.

I think for starters, we just wouldn’t. If a taxon split happens, the sources of the taxon split would probably contain the English common name, and the scientific name, and the newly introduced taxa would just not have any common names.

Then whoever would volunteer to maintain translations for a specific language, could optionally sync the translated common names again. This would likely also help with time delays between updates to iNat taxonomy and e.g. Wikidata, or whatever source of information we’d be pulling the data from.

For now, I think a useful start would be a tool that would show the diff between iNat common names in a specific language, and external sources like Wikidata. Sounds like a weekend project! :)

2 Likes

@carrieseltzer asked me to comment, so at the risk of getting burned by the flame and squashing updates about this thread ad infinitum, my opinion is that bulk importing common names from Wikidata is a bad idea because of the sourcing issues Cassi described at https://forum.inaturalist.org/t/reliable-sources-for-common-names-on-inaturalist/5579. In general, I support proscriptive sources of common names when available, e.g. a published book with an ISBN or the names published by the American Ornithological Society, for the same reasons we support external authorities for taxonomy: it moves discussion a contentious issue to institutions more suited to decision-making than iNat. If such authorities do not exist, I support a descriptive approach, i.e. adding names that are actually used by people. Obviously you can’t batch-import names like that, but you can use that heuristic to decide what names you add to iNat by considering, “Do I actually use this name? Have I heard other people use this name?”

Wikidata is not a proscriptive or descriptive source of common names, because there is no editorial oversight that would provide proscriptive authority, nor is there any indication of how frequently a name is used that might provide descriptive authority. If you’re going to use an external source, I would try to restrict yourself to Wikipedia’s own definition of a reliable source… which does not include Wikipedia itself, or Wikidata.

So, to the original question, “Can I bulk-import translations of common names of taxa?”, you can but I’d prefer if you do not. We at iNat will occasionally do this on request, but we require a source, which we generally inspect to see if it’s some kind of authority like a government agency or published work, and frankly I don’t really like doing that either. I would not accept names exported from Wikidata. I’d much prefer that people add names one by one, and think about whether those names are actually used by other people each time.

6 Likes

So if Wikidata used a reliable source for the common names ( https://www.wikidata.org/wiki/Property:P1896 ?) , Wikidata can be used. It gives the advances of one interface world wide for all languages.

Thank you, I think this puts me in the right direction. So here is what I have so far.

In the interest of speeding things up, I created a small userscript (to be used with TamperMonkey / GreeseMonkey), that adds this “WikiSpecies: Sync Translations” link that you’ll see on the right:

(I’ll publish the script when I fixed some remaining bugs.) When I click the link, it fetches the translations from Wikispecies using the Wikimedia API. These read-only requests don’t even need an API key. Then it compares what it sees in the table and suggests fixes in the “Actions” column, and it adds new columns as well, with an action to add them (just takes me to the pre-filled form).

This is a lot faster than entering values through the form manually, less prone to copy-paste errors, and I add the source link to the description field. However, nothing is done until I click, and I have a chance to manually review things before entry. For many taxa, these are well established names that I know are widely used in textbooks and other literature.

For others, especially species, I wouldn’t know the name myself, so I’d look it up in Wikipedia. If there is an article already with photos and references, I’d just trust it without spending too much time looking at sources. Sometimes, there is however no Wikipedia article, but I can still find the name in my own books, or online sources: scanned books or similar.

In cases where there is no Wikidata entry, I’d prefer to add the translation to Wikidata first, with a reference, and then do the import using my tool. This has two advantages over importing to iNat first:

  1. Other sources might use Wikidata, and I think it has a higher chance of experts stumbling on it and fixing any mistakes I made, or updating the translation as necessary.
  2. I’d like to avoid adding the name to iNat, and then have it appear in Wikidata later, with a reference back to iNat as a source of truth. Like you said, it is better if iNat delegates the naming problem to somewhere else, and I believe Wikidata can be a good place for cases where there’s no “official” source.

I have some old books I plan to go through and check if the names can be imported, but for now I’m just fixing higher level taxa, like Fabaceae in my first example.


A side note, I’d prefer to use iNat, the website and the app, in English, and have only the common names show up as preferred in some location. Sadly, this doesn’t seem to work for all species, and I haven’t figured out what this depends on. For some species, the Hungarian common name would be shown even when the website language is set to English, for other taxa I have to change the website or the app locale to Hungarian to see the Hungarian common names. Yet other times, the name is shown differently in a list view vs. when I click the observation. Those are I assume different bugs that need to be dealt with separately, but for now a workaround is to change the site language.


Another side note: iNat “lexicons” don’t seem to use ISO language codes, while the “vernacular names” from Wikidata only use ISO codes, so mapping the two gets tricky. Even worse is that I can’t even map the languages by their English name easily, since e.g. iNat uses “Chinese (traditional)” (and lexicons.chinese_simplified), while Wikidata uses “Traditional Chinese” (and zh-hant), and similar issues.

Not wanting to maintain a lexicon-to-ISO-code mapping myself, for now I’m just abusing the language selector on the bottom of the side, which does use ISO language codes, and conveniently maps them to the language as it is displayed in the common names table. It doesn’t cover all lexicons, but it covers the ones I care about, so I call “good enough” and move on. But it would be nice if (a) iNat API provided a list of available lexicons, so I wouldn’t need to scrape the dropdown in the form, and (b) if the iNat lexicons were mapped to ISO codes for the language that the lexicon is based on. This would especially be useful for languages like Spanish, Portuguese, French, English, etc. which are used in many countries.


Lastly, sorry for the long answer, and thanks for the input. I got some useful information on this post, most importantly that I shouldn’t pursue any bulk-import but I would still like to make it easier to add more common names.

2 Likes

I had to add probably thousands of names, it’s not really hard to do one-by-one if you already have a spreadsheet, but even without it it’s pretty easy, iNat lacks many names from many languages, but you mentioned translation coverage which is separate from names and only refers to website itself (and app) on iNat, so if you want to help with it, you can do it too.

Maybe 9013 common names, position 19 on the leader board…

2 Likes

The direction I was trying to put you in was

  1. Don’t add names from Wikidata
  2. Don’t automate the process of adding names from Wikidata
  3. Just add names the slow, manual way, and think about whether each name is actually in common use

You seem to have gone in the opposite direction.

2 Likes

I’m sorry if that is how it appears.

  1. Don’t add names from Wikidata

That doesn’t seem to be good blanket advice, for two reasons:

Firstly, many, many common names are missing. Even for taxa like “Primates”, there was no common name in my case. Copying those from Wikidata is just a shortcut: I could enter the name manually, but I could also use a script to do it for me. I still review the form before submitting, correct any mistakes, and fix the case (which is quite well defined for Hungarian: lower-case all unless it contains a proper name).

I do all that before I submit the name. At this point, the use of Wikidata as a source is just convenience: I know what “Primates” are called without having to look it up in Wikidata. But looking up is easier.

The second reason is, in cases where Wikidata doesn’t have the common name either, I would go ahead and enter it both in Wikidata and iNaturalist (citing at least one reference in both cases). Often this is just missing data, and I know the name. Sometimes I don’t know the name and I have to look it up in books, but often it is listed in published papers, on a university website for example. Even if I don’t know the name myself, I trust the source.

  1. Don’t automate the process of adding names from Wikidata

Like I said, I still review the form, and submit each change one by one. I’ll keep the script private and I won’t bulk-import taxa — I mostly care about my own observations anyway. I’m not trying to take over the world here :)

I still think I’m adding names in the slow, manual way, and I’m doing the research where I don’t know the name already.

I hope to explained my approach and motivation here. The last thing I want is introduce some sort of data corruption.

I wish you good luck, I am to lazy for it and also afraid that someone already invented the wheel. If you like Jupiter notebooks you can download something here.
https://github.com/andrawaag/arise_hack2022