Use computer vision to annotate observations?

Scott - this sounds supportive but I’m not sure what the implication is. There are 5.3 million insects records today, around 20% of which have life stage annotations. The percentage for lepidoptera is slightly better at 26% of 2.3 million observations. Is that enough data to start with? If not, what is the threshold?

As I update this, I realize that’s 300,000 new lep observations since I started this topic on May 16, roughly 225,000 of which do not have life stage information. This problem grows daily.

This was posted on another thread, but the first step of this process would be made so much easier if the CV was able to assign life stages (emphasis added):

@dkaposi has lifted up some interesting ideas around computer vision and annotations particularly. I am hopeful that this discussion has helped allay his concerns about the increasing number of unannotated observations.

As long as the raw data (the image) is stored, the classification of that data remains an attainable task–no matter what methods are used to attain it. I would include annotations in this - no matter if there are a million or a kajillion observations without annotations.

The goal of training the AI to recognize annotation categories in incoming data seems to me to be less like a feature request and more like a project (IMHO)

In my experience, projects struggle when their boundaries are not well defined. The glorious vision of the AI solving our very human annotation problem has a bit of that undefined glow to my eye.

This is not by any means a reason not to begin - it is more a discussion of how to begin.

As you may have noted in the forums, there is quite a few differing views on what annotation categories are valid and/or useful. This coupled with the wide variety of images that are included in observations creates challenges. The scope of the problem is made exponential by the number of organisms that iNaturalist includes.

The threshold for ‘enough data to start with’ is always a matter of opinion :) Good data is like money in your pocket - an asset whose value grows with its size. It is also useful to focus your data improvements in some way, so that the work put in has more effect

For example, look at the organisms that are the center of your expertise.

  • Which annotated stages work for that group?

  • Are there images of all those annotated stages for the species common in your locality?

Even a small number of observers that focus on a consistent approach for documenting and annotating a single or closely related group of organisms can reasonably build a higher quality annotation data set within the INaturalist data. Such a data set could then feed a pilot project for exploring AI identification of life stages.

Within the discussion here, I do think I see hidden some specific feature proposals,

  1. a way to add annotations to one’s own observations en masse - The Batch Edit does allow observation fields and tags to be edited in this way but not annotations

  2. a way to add annotations to other’s observations with fewer clicks

Both these proposals have the same goal - to improve the number of human made annotations in the data. Feature proposals have their own forum section and protocols - here’s an overview of how feature requests work

Thanks for the opportunity to think and write about this.

That is excellent news, I’m glad there is some interest and traction on this. Obviously, I’m interested in learning how this can move forward.

Every annotation is linked automatically to at least 1 observation field. Adding the appropriate observation field means the linked annotation is automatically added.

For example adding the Insect Life Stage observation field automatically populates the annotation ‘Life Stage’.

So you can already do this (on your own records). Please note the reverse direction does not work - adding the annotation does not populate the observation field. You just have to find the correct observation field that is linked.

