Related to the request to require approval for a new lexicon, I would like to request a cleanup of duplicate/wrong lexicons already in the system. Obviously this only applies to duplicates and wrong lexicons (e.g. “Lexicon 1”). No common names for taxa would be thrown out, just merged into a single consistent lexicon where appropriate.
For an idea of what I’m talking about, here’s the lexicon list I posted to the request for lexicon approval thread:
Australia
Australian
Indigenous Australian
[note that Australia has many indigenous languages]
Creole (English)
English (Creole)
German
Deutsch
Spanish
Spanish (Chile)
Spanish (Perú)
Español (Argentina)
Español (Chile)
Español (Costa Rica)
Español (Ecuador)
Español Chileno
French
Français
Greek
Greek (Modern)
Modern Greek (1453 )
Ju|'hoan
Juǀ’hoan
Oshi Kwanyama
Oshikwanyama
Portuguese
Português
Ru Kwangali
Rukwangali
Setswapong
Setswapng
Slovak
Slovakian
Sotho ( Northern)
Sotho (Northern)
Swahili
Kiswahili
Teenek
Tenek
Tonga
Tongan
Manx
Manx English Dialect
Navaho
Navajo
Zeelandic
Zealandic
Scientific Names
Nombres Científicos
Nomi Scientifici
Noms Scientifiques
Wetenschappelijke Namen
Wissenschaftliche Namen
Guess 141 names have an incorrect language ‘und’ and lexicon ‘Русский’. Probably this update will violate unique constraints…
I think that the records with
language=‘und’ and lexicon=‘Русский’
should be
language=‘ru’ and lexicon=‘Russian’
Seems to be 141 records.
Update VernacularNames
Set language=‘ru’,
lexicon=‘Russian’
Where language=‘und’
and lexicon=‘Русский’
They are dated around 2019-04-01T14:50:08Z, so the tap might be closed later in 2019. You can find the records in the file with the lexicon-is-not-in-english-file: VernacularNames-.csv
Also the lexicon ‘‘other’’ is remarkable… often related to (imports of) VernacularNames
There may be a problem when there are different languages which have the same name but are quite different and are spoken in different countries. Luo in Cameroon may be completely unrelated to Luo in Kenya . This problem needs some attention. Just like Tonga in Africa and Tonga in Pacific. Then there is Yao in China and Yao in East Africa. Ata in Philippines and Ata in Papua New Guinea There must be many other duplicate nameexamples. Perhaps the country or continent, needs to be written after a language, at least for some countries !
Another (recently added?) duplicate lexicon that needs to be merged into Scientific Names is Scientific Name (the existence of which makes it take a little bit longer than it would otherwise to tab through the new taxon name page when adding scientific name synonyms)
I’m looking into this today. I have a script that does a lot of this stuff, so I’m trying to update that script and to automate it so it runs once a month or something.
However, it’s not at all clear to me what should be considered the canonical lexicon for any given set of synonyms. Is it reasonable to just choose the first name for the language on the English Wikipedia page for that language? For example, that would mean “Kwangali,” RuKwangali," and “Ru Kwangali” would all become “Kwangali.”
I’m not going to do anything about “Luo” if there are multiple languages with that name. There are only ~12 names with that Lexicon. It seems like people have been using “Tongan” for the Pacific island language and “Tonga” for the African language, so I don’t think there’s anything to do there.
Lexicons like “Und” and “Indigenous” will just be left alone since they need manual attention. I guess I could synonymize “Creole (English)” and “English (Creole)” but they’re both equally useless, right? There are a bunch of English creoles out there and presumably they could have different names for the same taxa.
Had a quick look at the script, may have missed it.
Occasionally there are names that don’t have a lexicon, as far as I’m aware there is no way to search for them, would it be possible to assign them to something like ‘Und’ or ‘Other’ so they can be found in the DWCA export for correction where possible?
It should be impossible to create taxon names without a lexicon, but there are some that exist from before we instituted that rule. I don’t want to give them a lexicon of “Und” or something because that basically violates the rule, but I can include them in the DwC-A under the language und (a real ISO 639 code!). That would meet your need, right?
The first/top name in the Language Infobox on English Wikipedia pages should be the exonym, not sure how reliable this is though. Other options may be referencing to the name used in ISO 639 or Glottolog.
Yes, the few I can recall were imports from a while back.
I would like that discussion to be on a flag, so local people can say which is the correct name for their language. Or at least a locally based scientist.