Species naming conventions and capitalization in non-English languages

While the current automatic capitalization of common names works quite fine for English language and as I understand the previous discussion about correct formatting of English names of certain taxonomic ranks and groups was rather difficult, I’d like to open discussion regarding some other languages that have pretty straightforward rules that should be easy to implement and would improve experience for some non-English speaking users.

In Czech language the best action would be to show name with only the first letter in upper case, without formatting the rest of the text, or use no formatting at all. The species names of animals, plants and fungi in Czech are mostly binominal, with genus names first (often shared across multiple close genera) and adjective species name second. For example name for violet ground beetle (Carabus violaceus) is stored in iNaturalist as “střevlík fialový” and could be displayed either as “Střevlík fialový” if used in title, search results and so on, or just “střevlík fialový” (but not “Střevlík Fialový”!). There might be uppercase letter in second (species) name when species is named after person, for example “Střevlík Linnéův” for Carabus linnaei, so it is not possible to automatically lowercase rest of the name.

Capitalization of first letter based on language should be easy to implement (even when dealing with some special characters). Same approach can be applied to Slovak language (ex “Bystruška fialová”), Polish (ex “Biegacz fioletowy”) and probably some other especially Slavic languages. At least the Czech names in iNaturalist are mostly imported from relatively reliable source and are already in uniform format.

Best regards
Ondrej

6 Likes

While the question can certainly be debated again, I will note that it does not ‘work well’ even in English, just ask the plant people their views.

It was a compromise due to the enormous number of combinations of languages, taxonomic families (birds get capitalized, plants do not etc), specific rules (dealing with hyphens, proper names such as places etc), and no effective way to enforce any standards of entry of data. And like any compromise, not everyone got what they 100 percent wanted.

3 Likes

Well, Russian names are also having both (or more) words capitilzed even if you add the name the way you show for Czech names, I never thought about why it is so, just decided it’s how the website shows names, while searching the name works fine whenever the letter is capitilized or not I’m okay with it.
I think it’s not completely right to leave names in the format they’re added, I saw names in all caps and names with added signs as they were copied from Wikipedia without changes, so the system probably should stay as it is now, or even become a little stronger on signs you can type in a name, at least while not before most of the names are added for that (or all) language(s) which is not a soon deal.

3 Likes

I think @ondrejzicha brings up a different point here, which I don’t see covered in previous discussions: capitalization is much more grammatically serious in some languages than in others. In English, capitalizing common names is essentially a stylistic choice, as previous discussions on iNat have shown. This is not the case in German, for example. It sounds like it’s not the case in Czech, either. I would hate for iNat to not catch on in other countries because it has glaring grammatical errors. Do I have that right, @ondrejzicha?

7 Likes

Like @cmcheatle says, the current capitalization scheme was implemented as a compromise after along and rancorous debate. And I beleive nearly all of the participants were English speakers.

This discussion shouldn’t be about the overall capitalization formatting, but just about languages with firm rules about capitalization. I think @ddennism phrased it really well.

1 Like

@ddennism your point is very well taken. I think there is one exception for

which is that we would never not capitalize an English proper name, even if we were lower-casing everything else. I would put that on par with the rules for German, for example.

The big questions will be for how many languages does iNat want to support capitalization rules, and how much coding is going to be needed to implement rules for each language (including brute-force coding for the inevitable one-off exceptions). For some languages, the rules may be fairly straightforward and consistent. For other languages, it may come down to, “if you want them capitalized correctly, enter them correctly, and iNat will display them as-is.”

Maybe one approach would be to have every name carry a “correctly capitalized” Boolean attribute that only curators could set. Setting it would “lock” editing of that name for non-curators, and give that name priority over other capitalization variants of the same spelling in that lexicon. Such names would be displayed as-is, without real-time application of any capitalization code. Maybe language-specific capitalization code could even be applied retroactively to all existing names to jump-start things.

Just thinking out loud here, knowing how complex it may be to implement correct capitalization site-wide. But if it is going to make the difference between whole countries participating or not, I think the issue is worth serious attention.

1 Like

That would be the most reasonable option to me - for languages other than English, just have a guideline for how they should be entered in the first place, and then follow that. The only question would be whether to store them with the first word capitalized (when that’s needed for titles), or whether to store them as they would be used in a sentence (in case we want to do that at some point, like “Observation of wolf by userX”) and just uppercase the first word for display.

1 Like

There is and was a guideline for their entry and it was not followed. Users don’t want to read and plough through lists of documentation about entry ‘rules’ for dozens of languages and families, they just want to put the name in.

Many users are cutting and pasting since they may be adding names in languages they don’t have a keyboard for which makes changing hard with accents or specialized letters etc.

It has been elsewhere suggested that curators police it all and edit and validate every name. The curating group as it is comprised does not have the capacity to manage hundreds of thousands of species worth of common names in dozens of languages.

For reasons unrelated to this, the site has removed the ability of non curators to edit common names. Since someone unfamiliar with why will likely ask why, it was due to endless edit wars about what names should be used (Common Gull vs Mew Gull or even spelling Sugar Maple vs Sugar maple vs sugar maple etc) that grew very frustrating

There was also an unfortunate issue of hyper I will use the word retentive users removing common names they felt were invalid. I wont drudge up one of the arguments so this is a made up example. A user would declare ‘by convention of biological naming and primacy the name Robin was first applied to a European flycatcher, thus all subsequent uses of Robin must only be used on flycatchers otherwise they are invalid. The north American species erroneously called American Robin is a thrush not a flycatcher, thus the name is invalid and inaccurate thus I have removed it and replaced it with a scientifically accurate replacement of Common North American Orange-breasted Thrush’

And then someone would get rightfully outraged and undo that, and a day later the person would redo it, wash, rinse, repeat.

5 Likes

That’s a cool name for American Robin. :D
Those gull wars touched my inner core. Mew gull should be brachyrhynchus only.

2 Likes

Hello, sorry for late reply, had busy holidays.

Implementing special rules for different languages is complicated and I advise to avoid such discussion. What I proposed was to leave names as they are stored in database (or only with simple rules like making uppercase of first letter of the whole name to make them uniform with other titles on page) and only for some of the languages, where quality of names is uniform enough. I can speak only for the quality of Czech names, which were recently imported by Ken-ichi Ueda from the taxonomic database that I maintain. Even though there are still some names added by other users, majority of Czech names are now stored in correct format.

I quite like the @jdmore idea with attribute whether to format the name or not, something like that could be updated during the import of names from reliable sources, or set manually by administrators.

Is it possible to have administrators on iNaturalist with access only to names in certain language, who’d be able to edit or delete for example only Czech names? If it was, Czech National Museum could probably provide one such person to ensure quality of Czech names in an application that National Museum recommends for public to use not only during City Nature Challenge, but also its other actions and field trips.

3 Likes

Dear all,

I would like to join this topic as a employee of the National Museum in Prague. The City Nature Challange is under the auspices of this institution in Czech Republic.

As it was already mentioned, some countries have some specific grammatic rules and it would be really nice a and helpful if it can be somehow implemented in iNaturalist web and mobile app.

As I can see right now the mobile app is not even unified at all.
For example - Oryctes nasicornis I can see this species in the app as “nosorožík kapucínek” (czech language). This is fine, but in the web version there is “Nosorožík Kapucínek” I have no problem with the letter N (as it is a first word), but the letter “K” in the second word should be as “k”.
But other names in the app are with both letters as capitals, which is wrong.
It seems that the web page is unified, but unfortunately for our language with the use of both letters as capitals.

I have no idea how difficult it is to implement some kind of these rules. To program them. But as I can see right now, it was already implemented correctly in the app (at least in some cases), why it is not in web page?

I completely agree with ondrejzicha in his last last note.

About the problem of badly spelled names etc. and this paragraph from ondrejzicha:
“Is it possible to have administrators on iNaturalist with access only to names in certain language, who’d be able to edit or delete for example only Czech names? If it was, Czech National Museum could probably provide one such person to ensure quality of Czech names in an application that National Museum recommends for public to use not only during City Nature Challenge, but also its other actions and field trips.”
This can truly helps to clean the “czech language database” and we can try it in the National Museum in Prague, but of course we will need a lot of time for that. And I suppose that common users of the app will not be able to change the czech name, or add a new one.

5 Likes

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Thanks for reopening this topic!

I can add that the situation with Hungarian names, and as far as I can tell, with Serbian/Croation/Serbo-Croation names, is the same as what @dominik5 outlined in the previous post

At least for the website, this should be fairly simple*, a CSS rule like this will capitalise the first letter, and only the first letter, in common names:

.display-name.comname::first-letter {
    text-transform: capitalize;
}

With a little more extra work to match all occurrences of common names that should be capitalised, plus some convenience wrappers to make this easily on/off switchable per site locale. I’d be really happy to see this in action.

For the various mobile apps this would require a bit more work probably, but it still doesn’t sound too difficult.

However, for this kind of capitalisation to work, the common names should be entered in the lexicon in a way different than what is suggested at the moment. Namely, other than the first character, all characters should be lower-case, except when grammar rules would dictate otherwise, e.g. “Montezuma-zacskósmadár” or “Szent-Péter füve”, but otherwise “orvosi falfű”.

This requires going through existing names in the lexicon to change the capitalisation as necessary, as well as updating the rules displayed next to the “new common name” form, and possibly educating the more active folks who are entering common names.


*: Note that we want to apply the capitalize CSS rule to the first letter, and not the upper-case rule. This is because some letters might be digraphs, which have a distinct capitalised form that is different from the upper-case form.

Serbian, for example, when using the Latin script, has the character , which is usually entered as two separate characters, “D” and “ž”, but some input methods might enter it as the Unicode code point U+01C4. This character has three forms:

  • upper-case “DŽ” or U+01C4,
  • title-case “Dž” or U+01C5, and
  • lower-case “dž” or U+01C6.

Therefore, to make this work in cases where CSS rules can’t be used (i.e. the mobile apps), a bit more work would be needed to cover these edge cases. Just upper-casing the first letter might not always yield the correct result.

2 Likes

Good point! Also in other languagues.

Just to add the Bulgarian case.
In Bulgarian there isn’t a strong consensus if certain names should be capitalized or not but one thing is undoubtedly wrong on iNat - the English title case. In Bulgarian we only capitalize the first letter of the first word of a noun phrase and compound names look really awkward right now. Instead of “Лястовича опашка” we see “Лястовича Опашка”.

1 Like

Not sure if that counts as a bug, but in Polish common names have both words written in small letters. Example:

Cepaea nemoralis would be written wstężyk gajowy, not Wstężyk Gajowy - as it currently displays and is orthographic error.

Something to consider. :)

1 Like

Sadly for now it’s impossible to make first word appear with small letter, so consider it’s as a title. You can add names the way you want though (at the botton of taxon page they’re shown he way they were added), maybe one day it will be changed.

Because of the massive complexity of dealing with all the rules, exceptions etc across hundreds of languages, the site has decided upon a standard display format of capital case for all common names.

So this is not a bug, but rather intentional behaviour.

Great idea. See https://www.inaturalist.org/pages/curator+guide#names

https://forum.inaturalist.org/t/common-name-capitalized-on-the-website-but-not-on-the-app/22224
https://groups.google.com/g/inaturalist/c/Pn5ZJqFMtjM?pli=1