New Vision Model Training Started

I do think that if the CV only has enough training for one species within a genus, ID suggestions should be at genus. Two or more, at species. With an exception for genera with only one species. That may require a whole different approach, or it may be something that can be tacked on right at the end. Idk.

8 Likes

Ah, right, I missed the reference to ‘genus with one species with enough observations’ above.

I’m still not sure if this would always be a good idea. The reasons for only having one species with enough observations could be very different. For example

a) Only one species in the genus can be identified with images alone. All the others of which serveral are very frequent can’t.

b) The vast majority of observations belong to that one species, all others in the same genus are super-rare.

In case a) only training gainst genus would be very useful, I doubt many would disagree on that. In case b) you would give less information for avoiding a rare mistake, if that is a good idea is much more open for debate.

1 Like

This could be a criterion. If more than some threshold of observations within a genus are identified as one species (e.g. 50% of total, or 90% of RG) then it might be reasonable to include that species. But if there are 200 observations of species A, and 2000 others only identified to genus, then it would be better to include the genus in the model.

7 Likes

I agree - I also would have thought it would be significantly better if the CV model would only be trained on species where there are sufficient obs to cover at least two members of the genus.

I could imagine if this was one of the basic criteria ( alongside 50 RG and 100 obs), we would see a very positive shift in the type of work identifiers have to do in complex taxa.

2 Likes

Presumably monotypic genera would get their single species included too? But how would iNat determine whether a genus is truly monotypic, or only has had one species imported into the iNat taxonomy? Maybe that’s enough of an edge case that it wouldn’t be an issue. (or, at least would be less of an issue than the one we’ve got now!)

3 Likes

Not complaining, but can you please, please, please make it country specific!
The “old” one is an amazing bit of tech, but in Australia we are constantly having to lift id’s out of US species that don’t occur here. Which of course annoys the hell out of newbies (the last thing we want to do).
If there was just a filter that said “recorded in Australia” at the top level, it’d become really useful here; rather than being a right pain in the proverbial.
Perhaps as part of it’s training you can monitor how often the image match is agreed to/disagreed to by others?
Cheers
Brett

Hi Brett,

In March the team released changes to our suggestions UI to exclude suggestions of taxa that do not occur nearby by default. (See https://forum.inaturalist.org/t/better-use-of-location-in-computer-vision-suggestions/915/47?u=loarie for more info). Users now have to choose “show suggestions that do not occur nearby” in order to even see non-nearby suggestions.

Have you not seen a change in this regard?

Thanks,
alex

6 Likes

The “pretty sure” suggestions still don’t take location into account, so you can get a “top” suggestion of a genus that doesn’t occur on the same continent. I live in the US, where frequently this feature works well for me - if the top species is a European one, the genus is often still right for the US - but not always! And I can only imagine how it its in other places in the world.

2 Likes

I think that the species-level suggestions are filtered to “seen nearby”, but that higher level suggestions such as genus are not. Seems like ideally they should be!

1 Like

Hi Chris,

We are working on further improvements from the UI & algorithm side of things as well as the modeling side, but we don’t have anything to share yet.

Cheers,
alex

btw - I made and then deleted an earlier draft of this post because I realized I’m not 100% confident in my knowledge of how we’re using geofrequency right now, and I didn’t want to misrepresent anything. So someone else from the team will have to chime in on that.

4 Likes

Is the number of photos per taxon still capped at 1000? It seems like for some taxa thats plenty, while for some there could easily be enough combinations of completely different appearance based on life stage/habitat/sex that 1000 might not be sufficient to adequately sample the range intrinsic variation. I wonder if the cap could be adaptive somehow?

tbh if a taxon has that much variability someone needs to make a new species concept… or else the CV cannot reasonably be expected to ID it.

1 Like

I mean like various combinations of sex+egg/larva/pupa/juvenile/adult+‘evidence of presence’ observations, which can all look nothing whatsoever alike, could mean you get less out of 1000 pictures for some species vs something like a bison where its just different sizes but they all look more or less the same.

2 Likes

The ‘evidence of presence’ type observations are a pretty big wild card. Just curious, does the model look for variation in the photos it chooses? For example - pulling out feces, footprints, terrain/plant damage, bones, etc for elephants as well as a lot of clear-through-blurry images of the animals themselves?

When the model is finished will there be any kind of post with a list of the newly included genera?

It does work, you just need to clear out all the wrong ids first, which is not going that well/fast from what I can see, really, iders in e.g. US would need to spend a day or two just looking for wrong ids and reiing them or marking plants as cultivated, which for some reason isn’t happening while these plants contribute to “seen nearby” pool.

3 Likes

Agreed. I didn’t know about this whole leaves thing with the CV before, and it explains probably the majority of invertebrate CV problems I know of.

3 Likes

It’s more than a complaint. It’s a fatal design flaw.

1 Like

You can find it for a taxon separately if it is added to the CV-model, but as far as I know there is no filter to see a taxon list only with cv-added taxa e.g. here https://www.inaturalist.org/taxa.

But iNaturalist can return species which are in the database and not in the model. I have no idea of this label is shown for these cases.
https://www.inaturalist.org/taxa/13858-Passer-domesticus
image

If there is no default wiki for a taxon - then the default text says - so many obs for this species.
How can we see how many obs there are for a Pending species with wiki text?

Since the line you’re referring to just means number of observations on iNaturalist and isn’t strictly related to computer vision, that number is also listed on the top right of the taxon page above the graph.

This number is of Needs ID/RG observations (aka “verifiable” - bad name) and excludes Casual ones.

image

To view the total number of observations of any species, from anywhere on iNat, search for it in the top right search bar, click View Observations, then uncheck the box next to “Verifiable” in the Filters.

https://www.inaturalist.org/observations?taxon_id=635324&verifiable=any

image

3 Likes