Translating articles for gendered nouns, vowel / consonant rules, etc

Those of you who have experience translating software, how have you seen others handle these issues? Specifically, I’m referring to situations where we might have a source string like a %{fruit} where %{fruit} is a variable that could have a number of values, so if it was “grapefruit” in English, it would be “a grapefruit”, but the article used depends on whether or not %{fruit} begins with a vowel or a consonant, so we would either have to translate it as a(n) %{fruit} or ask translators to use a bit of code like @vow_or_con{vow:an|con:a} %{fruit} to handle “a grapefruit” and “an apple”.

For languages with gendered nouns we have a similar problem, e.g. in French where we want to handle “un pamplemousse” and “une pomme”. These situations get even more complex in languages like Russian and German.

So my question: how have you seen other software translation efforts handle these issues? Is having a bit of code in the translations like @gender{m:un|f:une} %{fruit} pretty conventional? Are there better solutions?

1 Like

Not the implementation you’re asking for, but I find a useful overview of things to keep in mind.

Try to avoid inserting tiny fragments into longer text, wherever possible. Try to translate whole sentences and paragraphs as a unit. In cases where it can’t be avoided, try to make the variable contain a whole noun phrase, e.g. use a list of fruits like [‘a grapefruit’, ‘an apple’] and insert the whole noun phrase into the template. That won’t help for languages where there’s mandatory verbal agreement with one (or more) of the noun phrases, which is why it’s often simplest to have a list of translated sentences, one for each possible value of the variable. Note that it’s a trade-off: you gain simplicity at the cost of having a lot of repeated content in the strings. Depending on your requirements, it may be worth it to label each value with number, gender, class, etc. and use bits of code in the templates.

Agree with what Jeremy says - tokenizing and translating single words tends to be difficult because of agreement, or different word ordering. It often makes sense to repeat text rather than overtokenizing to give the translators more flexibility.