Translating articles for gendered nouns, vowel / consonant rules, etc

kueda · September 20, 2019, 5:51pm

Those of you who have experience translating software, how have you seen others handle these issues? Specifically, I’m referring to situations where we might have a source string like a %{fruit} where %{fruit} is a variable that could have a number of values, so if it was “grapefruit” in English, it would be “a grapefruit”, but the article used depends on whether or not %{fruit} begins with a vowel or a consonant, so we would either have to translate it as a(n) %{fruit} or ask translators to use a bit of code like @vow_or_con{vow:an|con:a} %{fruit} to handle “a grapefruit” and “an apple”.

For languages with gendered nouns we have a similar problem, e.g. in French where we want to handle “un pamplemousse” and “une pomme”. These situations get even more complex in languages like Russian and German.

So my question: how have you seen other software translation efforts handle these issues? Is having a bit of code in the translations like @gender{m:un|f:une} %{fruit} pretty conventional? Are there better solutions?

wouterkoch · September 20, 2019, 7:44pm

Not the implementation you’re asking for, but I find https://developer.mozilla.org/en-US/docs/Mozilla/Localization/Localization_content_best_practices a useful overview of things to keep in mind.

JeremyHussell · September 20, 2019, 7:48pm

Try to avoid inserting tiny fragments into longer text, wherever possible. Try to translate whole sentences and paragraphs as a unit. In cases where it can’t be avoided, try to make the variable contain a whole noun phrase, e.g. use a list of fruits like [‘a grapefruit’, ‘an apple’] and insert the whole noun phrase into the template. That won’t help for languages where there’s mandatory verbal agreement with one (or more) of the noun phrases, which is why it’s often simplest to have a list of translated sentences, one for each possible value of the variable. Note that it’s a trade-off: you gain simplicity at the cost of having a lot of repeated content in the strings. Depending on your requirements, it may be worth it to label each value with number, gender, class, etc. and use bits of code in the templates.

iorek · September 20, 2019, 8:24pm

Agree with what Jeremy says - tokenizing and translating single words tends to be difficult because of agreement, or different word ordering. It often makes sense to repeat text rather than overtokenizing to give the translators more flexibility.

kueda · November 19, 2019, 8:24pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Translation issue: lack of capitalization etc Bug Reports	15	1272	April 11, 2021
Translating plural forms iNaturalist Next Discussion	1	65	December 23, 2024
French translation issue with gender of taxon ranks Bug Reports	2	447	October 31, 2022
Bad french translation (semantic issue) / Replace "1 personne suivie" by "1 abonné" Bug Reports	5	604	January 28, 2020
Wrong translations used for OBSERVATIONS, SPECIES etc. on the Explore and Your Observations pages Bug Reports web , translation , github-issue-made	13	490	July 15, 2024

Translating articles for gendered nouns, vowel / consonant rules, etc

Related topics