Providing captive pictures in special cases to help CV

ralfmuschall · July 24, 2023, 10:44am

I’ve read the (closed) thread https://forum.inaturalist.org/t/casual-observations-in-the-cv-training-set/35558
and understand that some captive plants are changed (via selective breeding) beyond recognition.

Now this produces a question:
Some plants, particularly biennial ones like Digitalis sp., are hard to recognize in the first year but easily later. E.g. D. purpurea has 40k+ observations, almost all of them in their second year with flowers. In the first year I’m unable to tell them apart from e.g. Verbascum.

I used jiffies to grow seedlings of that plant for my garden which are in the range of several mm to cm. Since they aren’t in soil, so pollution by other species can be excluded. I’d love to upload them somewhere in order to help early recognition if somebody encounters baby plants in nature.

thebeachcomber · July 24, 2023, 11:18am

You can upload the photos to iNat, just remember to tick captive/cultivated

rupertclayton · July 24, 2023, 5:18pm

It’s fine to upload these observations marked as captive/cultivated. But if your goal is to expand the recognition capabilities of iNat’s computer vision, you may want to consider that the images used for training are essentially a random subset (capped at 1,000) of those available for a particular taxon.

You mention that there are 40,000 observations of D. purpurea. If we assume an average of 1.5 images per observation, then there are likely to be more than 60,000 images eligible for the iNat training dataset. If you uploaded 60 images of your seedling foxgloves, you might have an 50/50 chance of getting at least one of them included in CV training (probability is complex, but it’s at about this level). I’m not sure what proportion of images would be necessary for CV to have a reasonable ability to distinguish a seedling Digitalis from a seedling Verbascum, but I doubt that including one or two relevant images in a training dataset of 1,000 will make much difference.

That’s not to say that there’s no value in adding cultivated observations to consciously assist the CV process. Let’s say you’re working to better identify wild populations of Plantus inconspicuous, a rare and not-at-all-showy plant that hasn’t been studied much. There are already 63 observations on iNaturalist, with 97 images. If you can add a bunch of new observations with images of the plant (from either wild or cultivated populations), there’s a good chance that P. inconspicuous will make the cut for a future CV training run. At that point it will become a whole lot easier for inexperienced observers to correctly identify unknown populations of the plant, which could lead to better understanding of its distribution and habitat.

If P. inconspicuous is hard to distinguish from Shrubbus generalis when not in flower, you might intentionally add a bunch of images of cultivated plants that show other parts and stages of growth. And so long as there are less than 1,000 images already on iNat, every image you add should be included in the next training dataset. I would guess that in that scenario, adding 15 images of seedling plants might result in a reasonable ability to identify P. inconspicuous seedlings.

david99 · July 24, 2023, 7:27pm

Are Casual observations included in the training for the CV? I don’t think they are, I think only Research Grade observations are, to limit errors. Either way, adding lots of photos of captive individuals isn’t really the purpose of iNat, and some other resource would probably be a better repository for those images.

bouteloua · July 24, 2023, 7:41pm

The current FAQ says:

Which taxa are included in the computer vision suggestions?

This has changed over time, but as of the model released in March 2020, taxa included in the computer vision training set must have at least 100 observations, at least 50 of which must have a community ID. Photos for training are randomly selected from among the qualifying iNaturalist observations (that is, it is not only the first image of an observation that may be used for training). Related species are sometimes inserted into the suggestions based on being seen nearby. When using computer vision, only the first image is assessed. As more observations are added and more identifications made, additional taxa can be added to the computer vision suggestions. This means your observations and IDs work to make better models!

i.e. photos do not need to be from research grade observations to be included.

It can definitely be helpful to add observations of cultivated organisms to provide example photos at different stages of life, but they’ll be less likely to be seen by people due to how Casual observations are by default hidden in most parts of the website until you change the filters.

ralfmuschall · July 25, 2023, 7:26am

I’ve just made a few pictures of my Digitalis babies and found a few interestinq details: the lower side of the leaves has hairs only along the veins, the top side has them on the whole area (maybe along finer veins I didn’t see yet). They will be uploaded later today.

PS: Done. https://www.inaturalist.org/observations/174825834

ralfmuschall · July 26, 2023, 5:59pm

The more I look for young plants, the more I feel the need for Inat’s CV somewhow to learn what different rosettes of biennial plants look like. The screenshot is a typical result, stuff like that keeps coming over and over (the observation in the example was without location, for European rosettes the list is usually like “Boraginaceae, Digitalis, Verbascum, Lapsana communis”).

I have not the slightest idea why L. communis is included (its leaves have a totally different shape, and it is an edible wild vegetable that should not be confused with poisonous other species). Maybe it’s because people keep uploading rosettes (L.c. is usually eaten before it sprouts the stalk with flowers) and CV thinks “it’s a rosette in Europe, so it must be one of the four because nobody has ever uploaded other rosettes with ID into me”.

Maybe we need manpower or special measures to improve that. It is hard to do with wild plants (Verbascum and Digitalis are similar, so the observer would need to be very qualified to say which is which in the first place). Somehow manually injecting images of verified (ideally homegrown, i.e. captive to be 100% sure of the species) might help.

spiphany · July 26, 2023, 6:22pm

I agree about the difficulty of training the CV to accurately recognize plant species at early growth stages – observations that can be successfully ID’d are likely to get buried among the more numerous observations of these plants at the stage when they tend to attract the most attention of observers (i.e., when blooming).

There’s a project that might be relevant in this context:
https://www.inaturalist.org/projects/rosettes-of-dicots

Also, it is possible even as a regular user to edit the photos on taxon pages, so if you have good verified observations showing rosettes of particular plants, you could add them to the selection of photos displayed for that species. This wouldn’t necessarily help the CV, but it would provide an easily accessible reference for other users/IDers.

system · September 24, 2023, 6:23pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is the CV trained on captive observations? General computer-vision	9	284	August 22, 2024
"Helping" the computer vision - is this wrong? General	37	2364	September 10, 2020
"rare" captive species General question	15	1327	February 28, 2022
Casual observations in the CV training set General	13	642	November 15, 2022
Recording landscaping plants General question	25	739	July 23, 2024

Providing captive pictures in special cases to help CV

Related topics