Seeking insights on training inaturalist AI for arctic fish identification

Hi iNaturalist friends,

We’re working on a pilot project called “Assessing Arctic biodiversity and integrating artificial intelligence." The goal is to improve how we identify Arctic fish fauna by training iNaturalist’s AI with a large number of validated photos from the Arctic region.

https://www.inaturalist.org/projects/pilot-project-assessing-artic-biodiversity

Questions for the Community:

  1. How many photos per species to see improvement?
    How many photos of a species should we upload before seeing noticeable improvement in the AI’s species suggestions? So far, we’ve been noting the top three ID suggestions and scoring them for every photo we upload, but the AI still struggles to recognize most species.

  2. How does the algorithm learn?
    Does the AI improve with every uploaded photo, daily updates, or only during larger regional updates?

  3. Photo orientation and quality
    Does the orientation of a photo (e.g., upside down or vertical) affect the AI’s ability to learn and make accurate identifications?

If you don’t have the answers but can point us to someone who does, that would be greatly appreciated!

Thank you in advance for any insights or suggestions.

2 Likes

Here’s a quote from the help page about what gets taxa initially included in the model:

After that, the model is updated monthly, here’s the latest update on the blog: New Computer Vision Model (v2.17) with over 1,000 new species! · iNaturalist
There are other blog updates with more information than that one. Last fall the geographic model was changed, but it didn’t help marine species much.

3 Likes

I did a similar thing for some local freshwater fishes that weren’t yet recognized by the AI. My recommendation is the more diverse the photos, the better.

AI does not know what a fish is. If every picture you submit has a white measuring board in the background (as is often the case for on-vessel fish photos) the AI WILL associate that board with the species in question when learning.

The more angles, lightings, and unique individuals the better.

3 Likes

I agree that more diverse photos will generally be better. Models like this will definitely learn about any characteristics of photos, even words written on labels, the types of cameras the pics were taken wit, etc. They have no idea that the pics are of focal organisms per se.

I don’t think rotation of the photos should matter. I believe that the algorithm rotates photos/pieces of the photos when it does its learning. But you can download the documentation/read the CV paper for details.

1 Like

1.) It varies a bit based on some detailed conditions the staff use for selecting photos, but approximately 100 separate observations has been the benchmark.
2.) It typically updates everything all at once, with updates coming once every month or two.
3.) Orientation/rotation of the same photo should have almost no effect at all, as far as I know. However other kinds of diversity in photos are very helpful; for example, varying lighting, backgrounds, the angle of the photo with respect to the fish/what part of the fish, and different cameras used to take the photo. Ideally it is preferable to have different people taking the photos; there can be subtle idiosyncrasies in the way that each of us hold a camera that are nearly impossible for us to control.

1 Like

Annika, you might want to fill out your profile a bit — both here on the iNat Forum, and on iNaturalist — as more transparency can foster greater engagement.

Here is a page about the Research Vessel (AKA RV) that is hosting Annika’s project in Arctic biodiversity:

https://natur.gl/facilities/skibe/new-ship/?lang=en

1 Like