Use computer vision to annotate observations?

Depends on your definition of ‘recent’: annotations were introduced in the summer of 2017 so we are coming up on two years. I wouldn’t say that is recent. My point about the old observations is that the unannotated list is large, but if 75% of all new observations are also unannotated, and the number of observations is growing, then you will have a growing backlog. I bet if you could rewind to 2017, the list of unnatotated leps was 500,000 or something, and a million last summer (I’m making up the figures, but the growth curve is the point). So even with annotations around, the number keeps multiplying.

Chris - if we wait, I fear that nothing will get done and hundreds of thousands, if not millions, more observations will be entered without the benefit of useful data supplements in the form of annotations. So if the male/female thing for birds is an issue, maybe that is a second phase rollout. But hopefully we can work on the unambiguous issues first, possibly to get a process sorted out.

As mentioned in the Lets talk annotations and Data uses cases threads, the ability to download annotations is also a requested feature. Speaking from a data user’s perspective, iNat’s insect observations are problematic because of the inconsistent application of life stage categorization, so a development like this could really address what some believe is big problem.

No doubt it’s a huge job, but isn’t a technology and workflow solution better than any other option? And if I recall correctly, the CV is retrained on a full dataset, implying each new dataset is much larger than any earlier version.

I guess my concern from a technologist perspective is that rule #1 of any work is to not make the problem worse, or to not simply introduce a new problem which replaces the one you are trying to resolve.

1 Like

I see your point, but the math would seem to indicate that you could reduce the absolute number of unnatotated observations hugely by dealing with the simple, single-variable observations. The current solution of hoping people will manually deal with it is so lacking in efficacy that a new direction is needed.

If you want to fill a bucket by catching rain in it, you don’t have to catch every single drop. Annotations “add value to an observation”… but it’s not a requirement for anything.

1 Like

Mark - the data that annotations can provide is valuable to many people (but possibly not to you). Please don’t diminish other peoples’/groups’ interest in better data.

2 Likes

The joyful news is that the main treasure of iNaturalist is the data set itself. The fact that it is growing in size and complexity means that it is becoming a richer and richer resource for exploring the capacity of computer vision to solve problems - not only problems related to the interests of iNaturalist users but also problems facing those interested in advancing the sophistication of computer vision. As long as the data set exists, it can be reanalyzed and re-categorized. I have been very grateful for this particular ability while the current winds of change are blowing so strongly through the world of taxonomy. :)

This discussion is doing a good job of outlining some of the problems of interest to the computer vision side of things.

  • how to decide which images are most useful to use in training sets and how to predict how many images are needed for effective training
  • what to do when images contain multiple subjects that are the focus of the training goals
  • how to balance multiple training goals

As computer vision continues to develop in sophistication, I expect that more ideas about how to solve these more technical problems will emerge. Here’s a link to the iNat Challenge on the website of the Sixth Workshop on Fine Grained Visual Categorization https://sites.google.com/view/fgvc6/competitions/inaturalist-2019 (warning: this link will contain computer jargon, similar but not identical to the scientific jargon you may already be accustomed to). Part of the attraction to participating in these competitions is the access to portions of large complex datasets like iNat to test out approaches and theories for improving the accuracy and efficiency of machine learning and visual categorization.

I agree that the annotations discussion is a valuable one - and that part of the rewards of working on the annotations problem will be the eventual prize that the long slog of updating annotations could be supported by computer vision aided suggestions.

4 Likes

This could also be a good task for new users/non-experts. I bet there are a lot of people who would like to help improve the data but don’t have the expertise to do species ID. If you encouraged people to help with this (maybe on a project page) and included a link to some basic instructions, I would think even users with little or no experience could correctly fill in this data.

2 Likes

Yes, see this related tutorial about how to add annotations:
https://forum.inaturalist.org/t/using-identify-to-annotate-observations/1417

It’s listed as one of the ways to help out on iNat too. :)

Insect life stages seems pretty straightforward and computer vision would have much higher confidence between them than distinguishing species. Run it automatically and allow users to upvote or downvote the computer vision suggestions.

Observations that show an egg, larva, then adult, or male, then female, in separate photos fail the criteria of what an observation is anyway, but only running it on observations with a single photo would be one step to take in limiting the generation of additional issues.

2 Likes

To me this is still illogical or poorly thought out. When I go to the provincial park where I will be tomorrow, I will likely see 200+ Yellow Warblers. I’m not recording 200+ observations, or even one for every separate individuaI I photograph.

Of course if I record 1 record and enter a count observation of 200, I’m breaking a different ‘rule’ because not all 200 were seen at that exact to the 6 decimal points coordinates.

Further discussion about what an observation is (or should be) can be hosted here:
https://forum.inaturalist.org/t/what-is-an-observation/3367

(though:

)

1 Like

A post was split to a new topic: Find observations without annotations?

This has gone quiet and before it closes I would really appreciate hearing from @alez and/or @loarie

Thanks
David

1 Like

We could certainly train a computer vision model to predict annotations like life stage rather than species. But remember to train the model, we’ll still need lots of training data so the more life stage annotations you can add the better the model will be.

2 Likes

A post was merged into an existing topic: Let’s Talk Annotations

Scott - this sounds supportive but I’m not sure what the implication is. There are 5.3 million insects records today, around 20% of which have life stage annotations. The percentage for lepidoptera is slightly better at 26% of 2.3 million observations. Is that enough data to start with? If not, what is the threshold?

As I update this, I realize that’s 300,000 new lep observations since I started this topic on May 16, roughly 225,000 of which do not have life stage information. This problem grows daily.

1 Like

This was posted on another thread, but the first step of this process would be made so much easier if the CV was able to assign life stages (emphasis added):

1 Like

A post was merged into an existing topic: What is an observation?

@dkaposi has lifted up some interesting ideas around computer vision and annotations particularly. I am hopeful that this discussion has helped allay his concerns about the increasing number of unannotated observations.

As long as the raw data (the image) is stored, the classification of that data remains an attainable task–no matter what methods are used to attain it. I would include annotations in this - no matter if there are a million or a kajillion observations without annotations.

The goal of training the AI to recognize annotation categories in incoming data seems to me to be less like a feature request and more like a project (IMHO)

In my experience, projects struggle when their boundaries are not well defined. The glorious vision of the AI solving our very human annotation problem has a bit of that undefined glow to my eye.

This is not by any means a reason not to begin - it is more a discussion of how to begin.

As you may have noted in the forums, there is quite a few differing views on what annotation categories are valid and/or useful. This coupled with the wide variety of images that are included in observations creates challenges. The scope of the problem is made exponential by the number of organisms that iNaturalist includes.

The threshold for ‘enough data to start with’ is always a matter of opinion :) Good data is like money in your pocket - an asset whose value grows with its size. It is also useful to focus your data improvements in some way, so that the work put in has more effect

For example, look at the organisms that are the center of your expertise.

  • Which annotated stages work for that group?

  • Are there images of all those annotated stages for the species common in your locality?

Even a small number of observers that focus on a consistent approach for documenting and annotating a single or closely related group of organisms can reasonably build a higher quality annotation data set within the INaturalist data. Such a data set could then feed a pilot project for exploring AI identification of life stages.

Within the discussion here, I do think I see hidden some specific feature proposals,

  1. a way to add annotations to one’s own observations en masse - The Batch Edit does allow observation fields and tags to be edited in this way but not annotations

  2. a way to add annotations to other’s observations with fewer clicks

Both these proposals have the same goal - to improve the number of human made annotations in the data. Feature proposals have their own forum section and protocols - here’s an overview of how feature requests work https://forum.inaturalist.org/t/about-the-feature-requests-category/69

Thanks for the opportunity to think and write about this.

1 Like

bringing this text over to keep the thread up-to-date:

That is excellent news, I’m glad there is some interest and traction on this. Obviously, I’m interested in learning how this can move forward.

Every annotation is linked automatically to at least 1 observation field. Adding the appropriate observation field means the linked annotation is automatically added.

For example adding the Insect Life Stage observation field automatically populates the annotation ‘Life Stage’.

So you can already do this (on your own records). Please note the reverse direction does not work - adding the annotation does not populate the observation field. You just have to find the correct observation field that is linked.

1 Like