Linking BOLD and iNat to populate iNat observations?

I am wondering what is the best strategy to link iNat and BOLD, for legacy observations and for future ones, in order to populate iNat with observations that were checked and to help the computer vision model.

For instance, we can duplicate pictures in both systems, but that would not be very efficient.
If this is an option, is batch import from BOLD feasible?. Please note that I am talking about transferring someone’s own pictures that (s)he is sure of, not from other folks.

Another possibility is to cross links to have the same picture in both systems (the picture would be hosted only in one place, and e-framed in the other, like in GBIF). Is that feasible? Though I suppose that iNat needs to host the pictures.

If cross linking is feasible, would it be better for future observations, to first put the obs on iNat and make a link in BOLD to the picture, or the reverse?

I hope my questions are clear,
Thanks

Welcome to the iNat forum!

I think that there’s an existing feature request that addresses your question here:
https://forum.inaturalist.org/t/allow-gbif-and-bold-photos-to-be-used-for-species-pages-lacking-photos/30348

You could vote and contribute thoughts there.

There’s also a recent discussion on DNA sequence data here: https://forum.inaturalist.org/t/integration-of-dna-sequence-data/37814
that may be applicable.

In general, duplicating records from other databases on iNat wouldn’t be good practice, but (personal opinion) it seems like allowing images from BOLD in the same way as for other sources as outlined in the feature request above would be a useful feature.

2 Likes

Wierd thought came to mind. Dating/hookup apps when you live in a city with few prospects. So you join all the apps to spread a wider net – only to find that the same few prospects have also joined all the apps, and you see the same people no matter which one you sign in to.

I have seen several of these threads recently; link iNat to this or that other site. But doesn’t that imply that the sites are redundant? If two sites serve different, or even complementary purposes, they wouldn’t need the same data.

The usual common denominator of fields for describing species occurrence data are: what, where, when and by whom. Beyond that basic set of fields there are many different specialist domains with their own extended data. For example, iNat focusses on images and citizen science, GenBank on sequence data, herbaria/fungaria on vouchered specimens etc. For good reasons there is no single biodata repository that serves all purposes, but they do share a common core of attributes. So no, I don’t think the existence of multiple sites with a subset of the same kind of data are redundant.

GBIF aggregates this diversity of biodata and relies on a standardised common-core set of fields (the Darwin Core). Many of us contribute data to several of specialist repositories and it makes sense to cross-link records, and not to try and squash data into a single repository that inevitably won’t do a good job for everything.

It does mean that GBIF may receive different ‘views’ of the same data from different sources, but then de-duplication of GBIF records is one of many steps necessary to make any data downloaded from GBIF fit-for-purpose.

3 Likes

The information is definitely complementary and there is also necessary overlap (metadata). These sites are for researchers at all levels, and researchers could definitely use the links.

Thanks everybody for your answers, and thanks @cthawley for the welcome and links to the other threads! [this is my first post on the forum].

Its seems that we all agree that duplicating observations is not recommended. Maybe using voucher number or other specimen ID may be a work around to identify duplicates, if there is no other choices than duplicating. And I agree with @cooperj that each site has it own purpose ant that’s good.

Perhaps the feature request mentioned by @cthawley may do the job, but my case is a bit different because my goal to help the training of iNat’s CV engine, not just showing the pictures on species’ pages of iNat. In other words, would iNat’s engine be able to learn on images that are not hosted on iNat?

After reading again the thread, I realize that it might actually be more appropriate to ask BOLD to have the possibility to frame pictures from other sites. It would solve the duplicated pictures and the CV training issues, but still not the duplicated obs. Tricky!

With a colleague of mine we will ask people from BOLD their opinion and I’ll let you know their thought (unless there are some BOLD people in this forum who can answer, I don’t know).

1 Like

I think this sounds like a good idea. Pics for an observation uploaded on iNat can definitely be used to train the CV model, and, if they are licensed properly for sharing, they’d be accessible to BOLD. Most photos with licenses that allow sharing are in the Amazon storage bucket (or whatever it’s called), so using those wouldn’t put a strain on iNat I don’t think.