Show Observations Which Share the Same Media

sunjiao · August 18, 2024, 2:04pm

Platform(s), All:

URLs (aka web addresses) of any pages, if relevant:
https://www.inaturalist.org/observations/236245670
https://www.inaturalist.org/observations/236284462
https://www.inaturalist.org/observations/234494612

Description of need:
It is relatively common for the same photo to contain multiple species. A user may see an observation of one species, but be more interested in the other. However, the process of finding another observation is complicated. Users need to navigate to the photos page to see the related observations, rather than seeing it directly on the observation page.

Feature request details:

I aim to automatically display observations that share at least one photo with the current observation on its observation page. This is similar to what I manually did in the provided links.

Related observations already appear on the photos page. For instance, this photo is linked to two observations: one for a plant and one for fungi. My goal is to find all related observations for each image, remove duplicates, and display them on the observations page.

Consider the following observations:

ob1: photo1, photo2, photo3
ob2: photo2, photo4
ob3: photo2, photo3, photo5
ob4: photo3, photo6

For ob1’s page, the first step is identifying related observations for each image:

photo1 is linked to ob1 only
photo2 is linked to ob1, ob2, and ob3
photo3 is linked to ob1, ob3, and ob4

The combined related observations are: ob1, ob1, ob2, ob3, ob1, ob3, ob4.

After removing the current observation (ob1) and duplicates, we get: ob2, ob3, ob4.

Thus, observations ob2, ob3, and ob4, which share at least one photo with the current observation, should be listed in the “related observations” section.

cthawley · August 18, 2024, 2:04pm

I changed the title of the request to reference the key element.

pisum · August 18, 2024, 3:38pm

this kind of request would likely be dependent on the implementation of another request: https://forum.inaturalist.org/t/photo-detail-endpoint-for-api/40481.

s-e · August 19, 2024, 7:33am

With a checksum value in the database for each uploaded image (this may already be implemented internally, but not available on iNat public platforms), then it would be easy to search for duplicate photos and related observations.

DianaStuder · August 19, 2024, 8:02am

Waiting since 2019
https://forum.inaturalist.org/t/duplicate-prevention-notify-observers-if-their-image-checksums-match-others-on-the-site/258

Also 2019
https://forum.inaturalist.org/t/notify-user-when-new-observation-matches-time-stamp-of-pre-existing-observation/253/13

environ · August 19, 2024, 8:04am

I’d love to see something like this. I’ve been adding notes that link related obs of mine which share an image - but it’s manually quite time consuming because you can’t do it as part of a batch submission (since you don’t have observation numbers yet), so you have to go back and edit each of them individually after they’ve been submitted.

sunjiao · August 19, 2024, 8:18am

As far as I know, mostly, different observations use the exact same same image with the same image ID on the iNaturalist server, rather than uploading the same photo twice. So this feature does not require the checksum as a precursor.

Of course, I totally agree that checksum is a much more important feature and would make the feature I proposed more usable.

pisum · August 19, 2024, 8:49am

i could be wrong, but i don’t think checksums – even with a sufficiently large hash size – would work like you think on iNat images because iNat modifies images during upload (to resize and remove metadata, among other things).

the ideal workflow for observations loaded or duplicated via the website should lead to a single photo record tied to multiple observation records, but the last time i checked, this is not usually the case when loading via the apps, which will result in each observation tied to its own photo record.

however, this doesn’t negate the usefulness of your proposed functionality.

environ · August 19, 2024, 8:53am

There’s also the possible case where two obs share the same photo but not the same image (eg. one only has a zoomed in and cropped view of one or more of the images in the other observation).

They are still the same photo, but not the same digital image and won’t share the same checksum (as would be the case if I uploaded a .jpg to one and .png to the other of the same original photo).

So it would be nice to (also) have a way to manually indicate observations are linked together in time and space rather than only doing it indirectly by sharing a bit-exact image between them.

cthawley · August 19, 2024, 11:28am

This related feature request has posts on using observation fields to link observations:
https://forum.inaturalist.org/t/link-observations/1367

s-e · August 19, 2024, 6:14pm

Still, removing metadata and compressing should produce the same file, if the exact “same media” (as per OP’s title) was uploaded… anyway it might be possible to calculate the checksum first, before these changes are made on the upload.

TBH I usually don’t use same media for different observations, or only after reframing, resizing, I understand it might happen more often for those using the app, those who are uploading mostly without editing their images, but I felt this could be an opportunity to fill two needs with one deed, ie the requests about doublons mentioned by DianaStuder here!

pisum · August 19, 2024, 9:46pm

should it? if the filenames are different and the timestamps of the files are different are different, are those the same file? (if you convert A into B and then A into C, is B = C?)

if you change the resizing codec or if you run it on a different hardware / software setup, do you end up with the same file?

what do you do for the images for the 200+ million existing observations?

environ · August 19, 2024, 9:51pm

Thanks for the pointer to that. I have a different set of use cases that are covered by the discussion there :) We have a project doing long term tracking of individuals, using sightings by many observers … For my own obs I’ve been tagging them with the identifier for each individual, but I can’t add tags to other people’s observations so we’ve been noting that in comments as the Current Best/Only way to add an immutable ‘curated’ (as opposed to ‘anyone can edit for any reason’) tag.

We’ve talked about creating a field for that, but there’s a bit more work to do behind the scenes with reconciling the identifications made by all identifiers to arrive at consensus for which individual an observation is of - and without an external reference record, fields have the ‘vulnerability’ of not being curated or having a change history so it’s a balance between the Feature of ‘anyone could add them’ and the Problem of ‘anyone could change them without that change being reviewed’.

It would be nice to have something like ‘per-user’ fields that let each user create and curate their own personal annotations for observations - though for this case a ‘per-project’ field would be more ideal, where the field content would be curated by the project curators.

That’s actually not quite true, at least over the long term. Image compression isn’t necessarily deterministic, even for lossless compression methods, and even when a newer compressor version hasn’t improved on the amount of compression it is able to achieve for a given quality setting - and many libraries will add metadata describing the version used to do the compression etc.

So a matching checksum could be a way to flag copies of the same image, but it shouldn’t be the only way and there needs to be a manual way to specify a true relationship (and possibly its nature).

environ · August 19, 2024, 10:02pm

With my tech hat on (:

If that’s the only difference, those are just links to, or aliases of, the Same File.

The devil is in the detail of what operation “convert” actually is. Maybe yes, maybe no, maybe maybe :)

Change codec, no. Change hardware/software, see the devil above.

The backend storage for all the images of these obs is almost certainly already doing this, even if its identifying hashes of those images aren’t available to the inat front end. Content addressable filesystems and deduplication isn’t something new.

I’m not saying that’s the answer which works best for our problem, but retrofitting something like it is not an intractable problem. 200 million images isn’t a ‘large’ dataset in the space of Big Data anymore :)

pisum · August 19, 2024, 10:40pm

if it is, it would be doing it on the converted files, right? my earlier point here was that you likely couldn’t base your hash on the original (pre-conversion) files, as suggested by s-e.

… regardless, i’m thinking this whole checksum tangent probably should stay in its own separate feature request.

environ · August 19, 2024, 11:56pm

Yes, on the content that is stored.

Oh, I missed that distinction, sorry - there are so many reasons it would make no sense to base a ‘checksum’ or other deterministic identifier on some transient state that couldn’t be reproduced to later verify it that I missed you meant the impossibility of doing it to existing observations and not the amount of work needed to apply it to 200+ million of them.

Topic		Replies	Views
Linking related observations to each other General question , observations	5	576	April 29, 2024
Merging duplicate photos of multiple species General	4	536	April 24, 2020
Link observations directly from photo browser page thumbnails Feature Requests	8	906	December 18, 2019
Display all observations / Duplicates shown while browsing Bug Reports	3	540	March 27, 2021
Observations where photos do not all contain the same individual organism: Annotation options? General question	23	1329	April 3, 2021

Show Observations Which Share the Same Media

Related topics