Seeking insights on training inaturalist AI for arctic fish identification

areinholdt · December 5, 2024, 7:44pm

Hi iNaturalist friends,

We’re working on a pilot project called “Assessing Arctic biodiversity and integrating artificial intelligence." The goal is to improve how we identify Arctic fish fauna by training iNaturalist’s AI with a large number of validated photos from the Arctic region.

https://www.inaturalist.org/projects/pilot-project-assessing-artic-biodiversity

Questions for the Community:

How many photos per species to see improvement?
How many photos of a species should we upload before seeing noticeable improvement in the AI’s species suggestions? So far, we’ve been noting the top three ID suggestions and scoring them for every photo we upload, but the AI still struggles to recognize most species.
How does the algorithm learn?
Does the AI improve with every uploaded photo, daily updates, or only during larger regional updates?
Photo orientation and quality
Does the orientation of a photo (e.g., upside down or vertical) affect the AI’s ability to learn and make accurate identifications?

If you don’t have the answers but can point us to someone who does, that would be greatly appreciated!

Thank you in advance for any insights or suggestions.

upupa-epops · December 5, 2024, 7:49pm

Here’s a quote from the help page about what gets taxa initially included in the model:

Why specialists should do identifications etc

Which taxa are included in the computer vision suggestions?

This has changed over time and may change before this FAQ is updated again, as we are continually working on improving the training process. But basically, here’s what’s needed for a species to be included in the Computer Vision model:

There must be a [sic] least 100 photos of the species and 60 observations of the species, and we don’t choose more than 5 photos from an observation to train the model. Observations do not need to be Research Grade in order to be used in training, but observations with a matching Community ID will be prioritized.

Some photos that are not included in the training phase are used to test and validate the model. These must have a Community ID.

Because of this, not every species with at least 100 photos and 60 observations will meet the requirements to be included in a training run. It’s dependent on how many photos there are per observation, and whether the randomly chosen group of observations meets the requirements.

If no species within a broader taxon like genus or family meets the requirements, we may train the model on that genus or family, based on those photos.

After that, the model is updated monthly, here’s the latest update on the blog: New Computer Vision Model (v2.17) with over 1,000 new species! · iNaturalist
There are other blog updates with more information than that one. Last fall the geographic model was changed, but it didn’t help marine species much.

zakqary · December 5, 2024, 7:52pm

I did a similar thing for some local freshwater fishes that weren’t yet recognized by the AI. My recommendation is the more diverse the photos, the better.

AI does not know what a fish is. If every picture you submit has a white measuring board in the background (as is often the case for on-vessel fish photos) the AI WILL associate that board with the species in question when learning.

The more angles, lightings, and unique individuals the better.

cthawley · December 5, 2024, 9:06pm

I agree that more diverse photos will generally be better. Models like this will definitely learn about any characteristics of photos, even words written on labels, the types of cameras the pics were taken wit, etc. They have no idea that the pics are of focal organisms per se.

I don’t think rotation of the photos should matter. I believe that the algorithm rotates photos/pieces of the photos when it does its learning. But you can download the documentation/read the CV paper for details.

wildskyflower · December 5, 2024, 9:07pm

1.) It varies a bit based on some detailed conditions the staff use for selecting photos, but approximately 100 separate observations has been the benchmark.
2.) It typically updates everything all at once, with updates coming once every month or two.
3.) Orientation/rotation of the same photo should have almost no effect at all, as far as I know. However other kinds of diversity in photos are very helpful; for example, varying lighting, backgrounds, the angle of the photo with respect to the fish/what part of the fish, and different cameras used to take the photo. Ideally it is preferable to have different people taking the photos; there can be subtle idiosyncrasies in the way that each of us hold a camera that are nearly impossible for us to control.

AdamWargon · December 5, 2024, 9:24pm

Annika, you might want to fill out your profile a bit — both here on the iNat Forum, and on iNaturalist — as more transparency can foster greater engagement.

Here is a page about the Research Vessel (AKA RV) that is hosting Annika’s project in Arctic biodiversity:

https://natur.gl/facilities/skibe/new-ship/?lang=en

system · February 3, 2025, 9:24pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Computer vision questions General	4	655	November 26, 2019
Better to feed the AI more photos or less? General question	11	1280	May 27, 2020
How many species are currently integrated to the Computer Vision model? General	2	436	December 6, 2019
Does Inat use data augmentation? General question , projects	6	447	March 20, 2025
Should I Upload Nearly Identical Pictures for a Rarely Observed Species? General	4	970	September 5, 2023

Seeking insights on training inaturalist AI for arctic fish identification

Related topics