Secondary data extraction

Wondering what people’s thoughts are on how to most efficiently extract secondary data from observations, and how to put this data to use in e.g. research.

As an example: I uploaded this observation https://www.inaturalist.org/observations/26045826 a while back. When I tried the auto-suggest at first, it suggested all birds, with the top suggestion being southern boobook (i.e. the correct ID). However, when I manually added ‘flies’ as an ID, and then tried the auto-suggest again, all the options were flies. So it’s obvious that the auto-suggest takes notice of existing IDs. Is there a way to expedite this process/apply it to large batches of data?

I’m co-writing a paper with a number of authors at UNSW, and one of the future frontiers of citizen science we wanted to address was how to take advantage of secondary data, e.g. look at all butterfly observations —> extract IDs of plants in photos --> develop knowledge/build database of host plant interactions.

Another possibility is extracting phenological data, e.g. take a batch of plant observations --> extract data on whether flowers are present --> understand flowering times. Unsure if something like this is possible. Would love to hear people’s thoughts.

2 Likes

i think you have 3 ideas here that may be only loosely related in terms how you would actually approach them. (how to most efficiently extract secondary data depends on what you’re trying to do, i think.)

the first is related to leveraging the computer vision algorithm for custom applications. i don’t think they make the iNaturalist computer vision available to others to use in that way, though i suppose you could write the staff with a specific proposal, and maybe they might be able to accommodate your request. there are other people building other computer vision models for different applications, too, and you could also approach them to see what they’re doing. here’s a discussion that talks about another such project, i think, including some discussion on what datasets that project is using: https://forum.inaturalist.org/t/how-to-download-taxa/3542. it seems like students and researchers training AIs for different applications (ex. https://www.inaturalist.org/blog/16807-mountain-goat-molts-inat-photos-and-climate-change) is a fairly common thing nowadays. so you might even go knock on some doors at your university to see if anyone’s working on something like that.

your second idea is related to organism interactions. your proposal here is to try to use the computer vision to figure out other organisms in a particular observation. i think conceptually you could do that, but i bet the chances of that producing good data for most observations is low. i think better data is to be found in cases where people have already used observation fields to explicitly record interactions. the staff did build a prototype to see if they could display that information in a friendly way, and that’s described here: https://forum.inaturalist.org/t/add-interactions-to-species-pages/433/2.

your last idea is related to phenology data. specifically, you want to leverage the computer vision to automatically capture that data. theoretically that’s possible, but based on this discussion (https://forum.inaturalist.org/t/use-computer-vision-to-annotate-observations/3331), i don’t think they’re currently training the iNaturalist computer vision to do that, and there may be a chicken and egg problem there as well. (to train the AI, first there must be phenology data to train on.) my thought is that flowering is one of the easier annotations to make. so if you’re really interested in that sort of data, it wouldn’t be too difficult now to manually go in and make annotations on a bunch of observations. Once you’ve made the annotations, then the taxon page will give you a nice graph of flowering by month (or you can customize / extract the summarized numbers via the API or get the details via CSV download or API).

Cheers @pisum, really appreciate the reply and links. The mountain goat one is especially useful.

ah, ok… if that’s the kind of think you’re interested in, then you may be interested in Wildbook, too. this is the project website: https://www.wildbook.org, and here’s an article about one or two of the (many) projects using it: https://www.oreilly.com/ideas/from-binoculars-to-big-data-citizen-scientists-use-emerging-technology-in-the-wild.

also, this is not in that vein, but it looks at the carbon cost of creating all these AIs for all these different things: https://www.technologyreview.com/s/613630/training-a-single-ai-model-can-emit-as-much-carbon-as-five-cars-in-their-lifetimes/.

that whole angle is truly odd and i’m not sure how i feel about it, but it does seem that computer banks like that are uniquely suited for renewable energy use given their highly localized nature and consistent power pull.

yes, here’s a Vice article on the same subject, and they do go into that a bit towards the end: https://www.vice.com/en_in/article/bj95qm/training-one-ai-model-produces-as-much-emissions-as-a-cross-country-flight-study-finds

1 Like