Treat hyphens as hyphens and underscore as spaces.

Please change the function that automatically IDs observations based on the source filename so that it treats hyphens as non-separator characters. Currently the system treats hyphens the same as spaces leading to unexpected results. This would cause an observation loaded from a file named Orange-crownedWarblerIMG_7385.jpg to be loaded without an ID. Under the current system, it treats the hyphen as a space and prefills in the name “Orange” which is unexpected and clearly wrong. Standard usage calls for an underscore to be parsed as a space instead.

I apporoved this but I think the best thing to do would be to use scientific names where possible. Trying to support various common name constructions is always going to be difficult.

When we added this feature, here’s what I wrote:

we can’t promise it will work in all situations and with all formatting schemes. Our advice:

* use scientific names
* only exact matches with iNaturalist’s taxonomy will work
* hyphens might not work
7 Likes

just curious… how difficult would it be to make the automatic ID based on filename something that could be toggled off when loading? seems like a toggle might help in some cases like this.

2 Likes

When I’m trying to type a common name, the fact that the machine ignores hyphens is useful, because sometimes the same name is spelled with a hyphen, sometimes at one run-together word, and sometimes as two words. So I won’t be voting for this, unless further information changes my mind.

2 Likes

just to clarify a bit, i believe this feature request is talking specifically about what the system does when translating filenames in to automatic IDs during upload, not about other situations such as when you’re looking up taxa when making IDs or when looking up taxa to get more information about a taxon.

3 Likes

I don’t use the feature which looks at filenames so I don’t have much of an opinion on this, but I hope people realize that scientific names can contain hyphens (Arctostaphylos uva-ursi and Symphyotrichum novae-angliae are two well known examples).

3 Likes

Can, but I’m told hypenated names are not valid so our salvias were renamed.
Salvia africana has lost its -caerula and the - blue African sky - we had before. Also Salvia aurea (lost -lutea). No one dares approach Euphorbia caput-medusae? Beware snakes!

@pisum if you put in a feature request for toggle off, I would support you. I prefer to sort my photos into an elaborate hierarchy of folders (my space, my rules for me). But we have had many forum threads from people who missed the announcement, or don’t understand the working as intended, or want to change it to suit them.

i probably won’t put in such a feature request, since i never use the automatic ID from filename feature myself. i just mentioned it, since i think a toggle would be one way to approach this issue while also addressing other use case permutations yet to be encountered.

1 Like

Thank you for making this request! The current situation is one of the most frustrating features of iNat. I’ve never understood why it would make sense to programmers for iNat to recognize a hyphen if I type it in, but not in an imported filename.

2 Likes

but you didn’t vote for it?

System doesn’t recognize it when you type it in, it is treated as a space.

i think what you’re saying has merit, but i think this is technically a little different from what is being requested in this thread. it’s a very subtle but important difference.

the request as written is asking for Orange-crownedWarblerIMG to be treated as a single string rather than being parsed as two separate strings Orange + crownedWarblerIMG.

i think you’re asking for a filename string like Orange-crowned_Warbler to be able to be matched to a taxon named “Orange-crowned Warbler”, which is currently not possible. currently, a filename like Orange-crowned_Warbler could be matched only to a taxon named “Orange crowned Warbler” (without the hyphen in the taxon name).

i support the kind of request that i think you’re asking for though, and if the original request is implemented, i think it would makes sense to also incorporate this other request at the same time.

here’s a little more detail about what happens if currently if you try to match on some variation of Orange-crowned Warbler in the filename:

if you have a file named any of the variations below, the observation will get loaded as an Orange:

  • orange crowned warbler.jpg
  • orange-crowned-warbler.jpg
  • orange_crowned_warbler.jpg
  • orange+crowned+warbler.jpg

if you have a file named leiothlypis celata.jpg (or any variation of that), the observation will get loaded as an Orange-crowned Warbler.

if you have a file named orangecrownedwarbler.jpg, the observation will get loaded without an automatic taxon.

if you have a file named orange balsam.jpg (or any variation of that), the observation will get loaded as Common Jewelweed (aka Orange Balsam).

so currently, because there’s a hyphen in the taxon common name, there’s no combination of words based on the taxon common name that you can include in a filename to trigger an automatic Orange-crowned Warbler id.

5 Likes

Thanks so much for breaking this down. Your analysis makes clear that it actually is a bug and not a feature! Much appreciated.

Apparently, hyphens are permitted by the ICBN only in certain circumstances:

60.9. The use of a hyphen in a compound epithet is treated as an error to be corrected by deletion of the hyphen, unless the epithet is formed of words that usually stand independently or the letters before and after the hyphen are the same, when a hyphen is permitted (see Art. 23.1 and 23.3).

Ex. 20. Hyphen to be omitted: Acer pseudoplatanus L. (1753), not A. “pseudo-platanus”; Eugenia costaricensis O. Berg (1856), not E. “costa-ricensis”; Ficus neoëbudarum Summerh. (1932), not F. “neo-ebudarum”; Lycoperdon atropurpureum Vittad. (1842), not L. “atro-purpureum”; Croton ciliatoglandulifer Ortega (1797), not C. “ciliato-glandulifer”; Scirpus sect. Pseudoëriophorum Jurtzev (in Byull. Moskovsk. Obshch. Isp. Prir., Otd. Biol. 70(1): 132. 1965), not S. sect. “Pseudo-eriophorum”.

Ex. 21. Hyphen to be maintained: Aster novae-angliae L. (1753), Coix lacryma-jobi L. (1753), Arctostaphylos uva-ursi (L.) Spreng. (1825), Veronica anagallis-aquatica L. (1753; Art. 23.3), Athyrium austro-occidentale Ching (1986).

Note 3. Art. 60.9 refers only to epithets (in combinations), not to names of genera or taxa in higher ranks; a generic name published with a hyphen can be changed only by conservation (Art. 14.11).

Ex. 22. Pseudo-salvinia Piton (1940) may not be changed to “Pseudosalvinia”; whereas by conservation “Pseudo-elephantopus” was changed to Pseudelephantopus Rohr (1792).

1 Like

In the current botanical code, ICNafp, it’s renumbered to 60.11.:

60.11. The use of a hyphen in a compound epithet is treated as an error to be corrected by deletion of the hyphen. A hyphen is permitted only when the epithet is formed of words that usually stand independently, or when the letters before and after the hyphen are the same (see also Art. 23.1 and 23.3).

Ex. 40. Hyphen to be deleted: Acer pseudoplatanus L. (Sp. Pl.: 1024. 1753, ‘pseudo-platanus’); Croton ciliatoglandulifer Ortega (Nov. Pl. Descr. Dec.: 51. 1797, ‘ciliato-glandulifer’); Eugenia costaricensis O. Berg (in Linnaea 27: 213. 1856, ‘costa-ricensis’); Eunotia rolandschmidtii Metzeltin & Lange-Bert. (Iconogr. Diatomol. 18: 117. 2007, ‘roland-schmidtii’), in which the given name and surname do not stand independently because the former is not separately latinized; Ficus neoebudarum Summerh. (in J. Arnold Arbor. 13: 97. 1932, ‘neo-ebudarum’); Lycoperdon atropurpureum Vittad. (Monogr. Lycoperd.: 42. 1842, ‘atro-purpureum’); Mesospora vanbosseae Børgesen (in Skottsberg, Nat. Hist. Juan Fernandez 2: 258. 1924, ‘van-bosseae’); Peperomia lasierrana Trel. & Yunck. (Piperac. N. South Amer.: 530. 1950, ‘la-sierrana’); Scirpus sect. Pseudoeriophorum Jurtzev (in Byull. Moskovsk. Obshch. Isp. Prir., Otd. Biol. 70(1): 132. 1965, ‘Pseudo-eriophorum’).

Ex. 41. Hyphen to be maintained: Athyrium austro-occidentale Ching (in Acta Bot. Boreal.-Occid. Sin. 6: 152. 1986); Enteromorpha roberti-lamii H. Parriaud (in Botaniste 44: 247. 1961), in which the given name and surname stand independently because they are separately latinized; Piper pseudo-oblongum McKown (in Bot. Gaz. 85: 57. 1928); Ribes non-scriptum (Berger) Standl. (in Publ. Field Mus. Nat. Hist., Bot. Ser. 8: 140. 1930); Solanum fructu-tecto Cav. (Icon. 4: 5. 1797); Vitis novae-angliae Fernald (in Rhodora 19: 146. 1917).

Ex. 42. Hyphen to be inserted: Arctostaphylos uva-ursi (L.) Spreng. (Syst. Veg. 2: 287. 1825, ‘uva ursi’); Aster novae-angliae L. (Sp. Pl.: 875. 1753, ‘novae angliae’); Coix lacryma-jobi L. (l.c.: 972. 1753, ‘lacryma jobi’); Marattia rolandi-principis Rosenst. (in Repert. Spec. Nov. Regni Veg. 10: 162. 1911, ‘rolandi principis’); Veronica anagallis-aquatica L. (Sp. Pl.: 12. 1753, ‘anagallis ’), (see Art. 23.3); Veronica argute-serrata Regel & Schmalh. (in Trudy Imp. S.-Peterburgsk. Bot. Sada 5: 626. 1878, ‘argute serrata’).

Ex. 43. Hyphen not to be inserted: Synsepalum letestui Aubrév. & Pellegr. (in Notul. Syst. (Paris) 16: 263. 1961, ‘Le Testui’), not ‘le-testui’.

Note 6. Art. 60.11 refers only to epithets (in combinations), not to names of genera (for names of fossil-genera see Art. 60.12) or taxa at higher ranks; a non-fossil generic name published with a hyphen can be changed only by conservation (Art. 14.11; see also Art. 20.3; but see Art. H.6.2).

Ex. 44. Pseudo-fumaria Medik. (Philos. Bot. 1: 110. 1789) may not be changed to ‘Pseudofumaria’; whereas by conservation ‘Pseudo-elephantopus’ was changed to Pseudelephantopus Rohr (in Skr. Naturhist.-Selsk. 2: 214. 1792).

1 Like

Thanks! I should have realized that my Google search might turn up an outdated version.

1 Like

The meaning of “formed of words that usually stand independently” is so opaque that one suspects its authors didn’t know what they meant, either.

The intended meaning is apparently closer to: “formed of words that are separately inflected so as to be grammatically independent”. Based on the examples, at least, the intent is clearly syntactic rather than semantic. We can hardly doubt that the two words in “Costa Rica” usually stand independently, but “costa-ricensis” is corrected to “costaricensis”. Should anyone have published an abomination like “costensis-ricensis”, though, the hyphen would have to be maintained. Similarly, Symphyotrichum novae-angliae" keeps its hyphen, while had it been “neo-angliae” or “nov-angliae” the hyphen would be dropped. But, just to make this example confusing, “nova-angliae” would have kept its hyphen because of the separate “repeated letter” rule.

1 Like

I was curious about this case and now regret my curiosity. :-) In case you would like to regret it, as well:

Linnaeus published those two in 1753 as “[Salvia] afr.cærulea” and “[Salvia] afr.lutea”. Then in 1762 he called them “[Salvia] africana” and “[Salvia] aurea”.

The abbreviations (“afr.”) are very weird, but not disallowed. ICNafp Article 60.14:

60.14. Abbreviated names and epithets are to be expanded in conformity with nomenclatural tradition (see also Art. 23 *Ex. 23 and Rec. 60C.4(d)).

Ex. 49. In Allium ‘a.-bolosii’ P. Palau (in Anales Inst. Bot. Cavanilles 11: 485. 1953), dedicated to Antonio de Bolòs y Vayreda, the epithet is spelled antonii-bolosii.

The original context makes it hard to see “afr.” as anything but “africana”, and that seems to be how everyone’s interpreted it. Of course, just expanding “afr.” to “africana” gives us "Salvia africana caerulea and “Salvia africana lutea”. There are a couple of similar examples under ICNafp Article 60.11:

Ex. 42. Hyphen to be inserted: Arctostaphylos uva-ursi (L.) Spreng. (Syst. Veg. 2: 287. 1825, ‘uva ursi’); Aster novae-angliae L. (Sp. Pl.: 875. 1753, ‘novae angliae’); Coix lacryma-jobi L. (l.c.: 972. 1753, ‘lacryma jobi’); Marattia rolandi-principis Rosenst. (in Repert. Spec. Nov. Regni Veg. 10: 162. 1911, ‘rolandi principis’); Veronica anagallis-aquatica L. (Sp. Pl.: 12. 1753, ‘anagallis -’), (see Art. 23.3); Veronica argute-serrata Regel & Schmalh. (in Trudy Imp. S.-Peterburgsk. Bot. Sada 5: 626. 1878, ‘argute serrata’).

So, Article 60.14 tells us to expand the abbreviation and Article 60.11 tells us to insert a hyphen. We get “Salvia africana-caerulea” and “Salvia africana-lutea”. No problem! Alas, no. This entry from Kew’s World Checklist of Selected Plant Families pointed me in the right direction. ICNafp Article 23.6:

23.6. The following designations are not to be regarded as species names:
[…]
(c) Designations of species consisting of a generic name followed by two or more adjectival words in the nominative case.

Ex. 19. “Salvia africana caerulea” (Linnaeus, Sp. Pl.: 26. 1753) and “Gnaphalium fruticosum flavum” (Forsskål, Fl. Aegypt.-Arab.: cxix. 1775) are generic names followed by two adjectival words in the nominative case. They are not to be regarded as species names.

Ex. 20. Rhamnus ‘vitis idaea’ Burm. f. (Fl. Ind.: 61. 1768) is to be regarded as a species name because the generic name is followed by a noun and an adjective, both in the nominative case; these words are to be hyphenated (R. vitis-idaea) under the provisions of Art. 23.1 and 60.11. In Anthyllis ‘Barba jovis’ L. (Sp. Pl.: 720. 1753) the generic name is followed by a noun in the nominative case and a noun in the genitive case, and they are to be hyphenated (A. barba-jovis). Likewise, Hyacinthus ‘non scriptus’ L. (Sp. Pl.: 316. 1753), where the generic name is followed by a negative particle and a past participle used as an adjective, is corrected to H. non-scriptus, and Impatiens ‘noli tangere’ L. (Sp. Pl.: 938. 1753), where the generic name is followed by two verbs, is corrected to I. noli-tangere.

Ex. 21. In Narcissus ‘Pseudo Narcissus’ L. (Sp. Pl.: 289. 1753) the generic name is followed by a prefix (a word that cannot stand independently) and a noun in the nominative case, and the name is to be corrected to N. pseudonarcissus under the provisions of Art. 23.1 and 60.11.

Lucky us, example 19 mentions Salvia africana caerulea!

So, the short version is: “Salvia africana-caerulea” and “Salvia africana-lutea” are, indeed, incorrect according to the botanical code. The problem is that the specific epithets consist of two adjectives in the nominative case. The hyphens aren’t the problem, and in a different grammatical context the code would specifically tell us to insert them.

A fun question—by which I mean a question I prefer not to think about—is whether “Salvia africana-caerulea” would be allowed under the ICNafp if it had been published in that form, with the hyphen. Are those still “two words”? And… would we keep the hyphen?

1 Like

I want my blue African sky ;~)

Euphorbia caput-medusae is safe then. Medusa and her head are not adjectives.

1 Like