Clean up currently available lexicons

I’m looking into this today. I have a script that does a lot of this stuff, so I’m trying to update that script and to automate it so it runs once a month or something.

However, it’s not at all clear to me what should be considered the canonical lexicon for any given set of synonyms. Is it reasonable to just choose the first name for the language on the English Wikipedia page for that language? For example, that would mean “Kwangali,” RuKwangali," and “Ru Kwangali” would all become “Kwangali.”

I’m not going to do anything about “Luo” if there are multiple languages with that name. There are only ~12 names with that Lexicon. It seems like people have been using “Tongan” for the Pacific island language and “Tonga” for the African language, so I don’t think there’s anything to do there.

Lexicons like “Und” and “Indigenous” will just be left alone since they need manual attention. I guess I could synonymize “Creole (English)” and “English (Creole)” but they’re both equally useless, right? There are a bunch of English creoles out there and presumably they could have different names for the same taxa.

5 Likes