I’ve read the (closed) thread https://forum.inaturalist.org/t/casual-observations-in-the-cv-training-set/35558
and understand that some captive plants are changed (via selective breeding) beyond recognition.
Now this produces a question:
Some plants, particularly biennial ones like Digitalis sp., are hard to recognize in the first year but easily later. E.g. D. purpurea has 40k+ observations, almost all of them in their second year with flowers. In the first year I’m unable to tell them apart from e.g. Verbascum.
I used jiffies to grow seedlings of that plant for my garden which are in the range of several mm to cm. Since they aren’t in soil, so pollution by other species can be excluded. I’d love to upload them somewhere in order to help early recognition if somebody encounters baby plants in nature.
You can upload the photos to iNat, just remember to tick captive/cultivated
It’s fine to upload these observations marked as captive/cultivated. But if your goal is to expand the recognition capabilities of iNat’s computer vision, you may want to consider that the images used for training are essentially a random subset (capped at 1,000) of those available for a particular taxon.
You mention that there are 40,000 observations of D. purpurea. If we assume an average of 1.5 images per observation, then there are likely to be more than 60,000 images eligible for the iNat training dataset. If you uploaded 60 images of your seedling foxgloves, you might have an 50/50 chance of getting at least one of them included in CV training (probability is complex, but it’s at about this level). I’m not sure what proportion of images would be necessary for CV to have a reasonable ability to distinguish a seedling Digitalis from a seedling Verbascum, but I doubt that including one or two relevant images in a training dataset of 1,000 will make much difference.
That’s not to say that there’s no value in adding cultivated observations to consciously assist the CV process. Let’s say you’re working to better identify wild populations of Plantus inconspicuous, a rare and not-at-all-showy plant that hasn’t been studied much. There are already 63 observations on iNaturalist, with 97 images. If you can add a bunch of new observations with images of the plant (from either wild or cultivated populations), there’s a good chance that P. inconspicuous will make the cut for a future CV training run. At that point it will become a whole lot easier for inexperienced observers to correctly identify unknown populations of the plant, which could lead to better understanding of its distribution and habitat.
If P. inconspicuous is hard to distinguish from Shrubbus generalis when not in flower, you might intentionally add a bunch of images of cultivated plants that show other parts and stages of growth. And so long as there are less than 1,000 images already on iNat, every image you add should be included in the next training dataset. I would guess that in that scenario, adding 15 images of seedling plants might result in a reasonable ability to identify P. inconspicuous seedlings.
Are Casual observations included in the training for the CV? I don’t think they are, I think only Research Grade observations are, to limit errors. Either way, adding lots of photos of captive individuals isn’t really the purpose of iNat, and some other resource would probably be a better repository for those images.
The current FAQ says:
Which taxa are included in the computer vision suggestions?
This has changed over time, but as of the model released in March 2020, taxa included in the computer vision training set must have at least 100 observations, at least 50 of which must have a community ID. Photos for training are randomly selected from among the qualifying iNaturalist observations (that is, it is not only the first image of an observation that may be used for training). Related species are sometimes inserted into the suggestions based on being seen nearby. When using computer vision, only the first image is assessed. As more observations are added and more identifications made, additional taxa can be added to the computer vision suggestions. This means your observations and IDs work to make better models!
i.e. photos do not need to be from research grade observations to be included.
It can definitely be helpful to add observations of cultivated organisms to provide example photos at different stages of life, but they’ll be less likely to be seen by people due to how Casual observations are by default hidden in most parts of the website until you change the filters.
I’ve just made a few pictures of my Digitalis babies and found a few interestinq details: the lower side of the leaves has hairs only along the veins, the top side has them on the whole area (maybe along finer veins I didn’t see yet). They will be uploaded later today.
PS: Done. https://www.inaturalist.org/observations/174825834
The more I look for young plants, the more I feel the need for Inat’s CV somewhow to learn what different rosettes of biennial plants look like. The screenshot is a typical result, stuff like that keeps coming over and over (the observation in the example was without location, for European rosettes the list is usually like “Boraginaceae, Digitalis, Verbascum, Lapsana communis”).
I have not the slightest idea why L. communis is included (its leaves have a totally different shape, and it is an edible wild vegetable that should not be confused with poisonous other species). Maybe it’s because people keep uploading rosettes (L.c. is usually eaten before it sprouts the stalk with flowers) and CV thinks “it’s a rosette in Europe, so it must be one of the four because nobody has ever uploaded other rosettes with ID into me”.
Maybe we need manpower or special measures to improve that. It is hard to do with wild plants (Verbascum and Digitalis are similar, so the observer would need to be very qualified to say which is which in the first place). Somehow manually injecting images of verified (ideally homegrown, i.e. captive to be 100% sure of the species) might help.
I agree about the difficulty of training the CV to accurately recognize plant species at early growth stages – observations that can be successfully ID’d are likely to get buried among the more numerous observations of these plants at the stage when they tend to attract the most attention of observers (i.e., when blooming).
There’s a project that might be relevant in this context:
Also, it is possible even as a regular user to edit the photos on taxon pages, so if you have good verified observations showing rosettes of particular plants, you could add them to the selection of photos displayed for that species. This wouldn’t necessarily help the CV, but it would provide an easily accessible reference for other users/IDers.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.