I see your point, but the math would seem to indicate that you could reduce the absolute number of unnatotated observations hugely by dealing with the simple, single-variable observations. The current solution of hoping people will manually deal with it is so lacking in efficacy that a new direction is needed.
If you want to fill a bucket by catching rain in it, you don’t have to catch every single drop. Annotations “add value to an observation”… but it’s not a requirement for anything.
Mark - the data that annotations can provide is valuable to many people (but possibly not to you). Please don’t diminish other peoples’/groups’ interest in better data.
The joyful news is that the main treasure of iNaturalist is the data set itself. The fact that it is growing in size and complexity means that it is becoming a richer and richer resource for exploring the capacity of computer vision to solve problems - not only problems related to the interests of iNaturalist users but also problems facing those interested in advancing the sophistication of computer vision. As long as the data set exists, it can be reanalyzed and re-categorized. I have been very grateful for this particular ability while the current winds of change are blowing so strongly through the world of taxonomy. :)
This discussion is doing a good job of outlining some of the problems of interest to the computer vision side of things.
- how to decide which images are most useful to use in training sets and how to predict how many images are needed for effective training
- what to do when images contain multiple subjects that are the focus of the training goals
- how to balance multiple training goals
As computer vision continues to develop in sophistication, I expect that more ideas about how to solve these more technical problems will emerge. Here’s a link to the iNat Challenge on the website of the Sixth Workshop on Fine Grained Visual Categorization https://sites.google.com/view/fgvc6/competitions/inaturalist-2019 (warning: this link will contain computer jargon, similar but not identical to the scientific jargon you may already be accustomed to). Part of the attraction to participating in these competitions is the access to portions of large complex datasets like iNat to test out approaches and theories for improving the accuracy and efficiency of machine learning and visual categorization.
I agree that the annotations discussion is a valuable one - and that part of the rewards of working on the annotations problem will be the eventual prize that the long slog of updating annotations could be supported by computer vision aided suggestions.
This could also be a good task for new users/non-experts. I bet there are a lot of people who would like to help improve the data but don’t have the expertise to do species ID. If you encouraged people to help with this (maybe on a project page) and included a link to some basic instructions, I would think even users with little or no experience could correctly fill in this data.
Yes, see this related tutorial about how to add annotations:
It’s listed as one of the ways to help out on iNat too. :)
Insect life stages seems pretty straightforward and computer vision would have much higher confidence between them than distinguishing species. Run it automatically and allow users to upvote or downvote the computer vision suggestions.
Observations that show an egg, larva, then adult, or male, then female, in separate photos fail the criteria of what an observation is anyway, but only running it on observations with a single photo would be one step to take in limiting the generation of additional issues.
To me this is still illogical or poorly thought out. When I go to the provincial park where I will be tomorrow, I will likely see 200+ Yellow Warblers. I’m not recording 200+ observations, or even one for every separate individuaI I photograph.
Of course if I record 1 record and enter a count observation of 200, I’m breaking a different ‘rule’ because not all 200 were seen at that exact to the 6 decimal points coordinates.
Further discussion about what an observation is (or should be) can be hosted here:
A post was split to a new topic: Find observations without annotations?
This has gone quiet and before it closes I would really appreciate hearing from @alez and/or @loarie
We could certainly train a computer vision model to predict annotations like life stage rather than species. But remember to train the model, we’ll still need lots of training data so the more life stage annotations you can add the better the model will be.
A post was merged into an existing topic: Let’s Talk Annotations
Scott - this sounds supportive but I’m not sure what the implication is. There are 5.3 million insects records today, around 20% of which have life stage annotations. The percentage for lepidoptera is slightly better at 26% of 2.3 million observations. Is that enough data to start with? If not, what is the threshold?
As I update this, I realize that’s 300,000 new lep observations since I started this topic on May 16, roughly 225,000 of which do not have life stage information. This problem grows daily.
This was posted on another thread, but the first step of this process would be made so much easier if the CV was able to assign life stages (emphasis added):
A post was merged into an existing topic: What is an observation?
@dkaposi has lifted up some interesting ideas around computer vision and annotations particularly. I am hopeful that this discussion has helped allay his concerns about the increasing number of unannotated observations.
As long as the raw data (the image) is stored, the classification of that data remains an attainable task–no matter what methods are used to attain it. I would include annotations in this - no matter if there are a million or a kajillion observations without annotations.
The goal of training the AI to recognize annotation categories in incoming data seems to me to be less like a feature request and more like a project (IMHO)
In my experience, projects struggle when their boundaries are not well defined. The glorious vision of the AI solving our very human annotation problem has a bit of that undefined glow to my eye.
This is not by any means a reason not to begin - it is more a discussion of how to begin.
As you may have noted in the forums, there is quite a few differing views on what annotation categories are valid and/or useful. This coupled with the wide variety of images that are included in observations creates challenges. The scope of the problem is made exponential by the number of organisms that iNaturalist includes.
The threshold for ‘enough data to start with’ is always a matter of opinion :) Good data is like money in your pocket - an asset whose value grows with its size. It is also useful to focus your data improvements in some way, so that the work put in has more effect
For example, look at the organisms that are the center of your expertise.
Which annotated stages work for that group?
Are there images of all those annotated stages for the species common in your locality?
Even a small number of observers that focus on a consistent approach for documenting and annotating a single or closely related group of organisms can reasonably build a higher quality annotation data set within the INaturalist data. Such a data set could then feed a pilot project for exploring AI identification of life stages.
Within the discussion here, I do think I see hidden some specific feature proposals,
a way to add annotations to one’s own observations en masse - The Batch Edit does allow observation fields and tags to be edited in this way but not annotations
a way to add annotations to other’s observations with fewer clicks
Both these proposals have the same goal - to improve the number of human made annotations in the data. Feature proposals have their own forum section and protocols - here’s an overview of how feature requests work https://forum.inaturalist.org/t/about-the-feature-requests-category/69
Thanks for the opportunity to think and write about this.
bringing this text over to keep the thread up-to-date:
That is excellent news, I’m glad there is some interest and traction on this. Obviously, I’m interested in learning how this can move forward.
Every annotation is linked automatically to at least 1 observation field. Adding the appropriate observation field means the linked annotation is automatically added.
For example adding the Insect Life Stage observation field automatically populates the annotation ‘Life Stage’.
So you can already do this (on your own records). Please note the reverse direction does not work - adding the annotation does not populate the observation field. You just have to find the correct observation field that is linked.
Before more specifically responding to this topic (which I did vote for although I suspect it will never come to pass), I need help understanding the value iNat staff places on Annotations. Perhaps @tiwane or @loarie can help.
In the post https://forum.inaturalist.org/t/batch-adding-annotations/3450, @tiwane specifically states: “Correct, it [annotation] has to be done one at a time. Which I think is good, IDs and annotations should should be important enough that you should take some time to do them.”
And yet @tiwane in this thread “hearted” a dismissive comment by kiwifergus:
So does the entity iNaturalist truly believe Annotations are as important as IDing a taxon, or are they an afterthought “feature” that are useful only if volunteers want to spend hours updating 420,000 lep obs from just the US and Canada?
Thanks if you can assist in delineating iNat’s position on Annotations in general.
Hang on… please don’t mis-quote me… or Tony for that matter…
“hearting” a comment can mean many things besides agreement. You can “heart” a comment that you disagree with but like the way they phrased it, or be liking the way they used much better language than they did on a previous comment, or that they kept their response shorter than they normally do…
was my comment “dismissive”? I think it was actually supportive of annotations, in that they add value to an observation… but just pointed out that they are not a requirement… it’s still a valid observation without an annotation. In fact, it’s still a valid observation without an ID. It’s even a valid observation without evidence (eg photo).