Show Observations Which Share the Same Media

Platform(s), All:

URLs (aka web addresses) of any pages, if relevant:
https://www.inaturalist.org/observations/236245670
https://www.inaturalist.org/observations/236284462
https://www.inaturalist.org/observations/234494612

Description of need:
It is relatively common for the same photo to contain multiple species. A user may see an observation of one species, but be more interested in the other. However, the process of finding another observation is complicated. Users need to navigate to the photos page to see the related observations, rather than seeing it directly on the observation page.

Feature request details:

I aim to automatically display observations that share at least one photo with the current observation on its observation page. This is similar to what I manually did in the provided links.

Related observations already appear on the photos page. For instance, this photo is linked to two observations: one for a plant and one for fungi. My goal is to find all related observations for each image, remove duplicates, and display them on the observations page.

Consider the following observations:

  • ob1: photo1, photo2, photo3
  • ob2: photo2, photo4
  • ob3: photo2, photo3, photo5
  • ob4: photo3, photo6

For ob1ā€™s page, the first step is identifying related observations for each image:

  • photo1 is linked to ob1 only
  • photo2 is linked to ob1, ob2, and ob3
  • photo3 is linked to ob1, ob3, and ob4

The combined related observations are: ob1, ob1, ob2, ob3, ob1, ob3, ob4.

After removing the current observation (ob1) and duplicates, we get: ob2, ob3, ob4.

Thus, observations ob2, ob3, and ob4, which share at least one photo with the current observation, should be listed in the ā€œrelated observationsā€ section.

I changed the title of the request to reference the key element.

1 Like

this kind of request would likely be dependent on the implementation of another request: https://forum.inaturalist.org/t/photo-detail-endpoint-for-api/40481.

With a checksum value in the database for each uploaded image (this may already be implemented internally, but not available on iNat public platforms), then it would be easy to search for duplicate photos and related observations.

1 Like

Waiting since 2019
https://forum.inaturalist.org/t/duplicate-prevention-notify-observers-if-their-image-checksums-match-others-on-the-site/258

Also 2019
https://forum.inaturalist.org/t/notify-user-when-new-observation-matches-time-stamp-of-pre-existing-observation/253/13

1 Like

Iā€™d love to see something like this. Iā€™ve been adding notes that link related obs of mine which share an image - but itā€™s manually quite time consuming because you canā€™t do it as part of a batch submission (since you donā€™t have observation numbers yet), so you have to go back and edit each of them individually after theyā€™ve been submitted.

1 Like

As far as I know, mostly, different observations use the exact same same image with the same image ID on the iNaturalist server, rather than uploading the same photo twice. So this feature does not require the checksum as a precursor.

Of course, I totally agree that checksum is a much more important feature and would make the feature I proposed more usable.

1 Like

i could be wrong, but i donā€™t think checksums ā€“ even with a sufficiently large hash size ā€“ would work like you think on iNat images because iNat modifies images during upload (to resize and remove metadata, among other things).

the ideal workflow for observations loaded or duplicated via the website should lead to a single photo record tied to multiple observation records, but the last time i checked, this is not usually the case when loading via the apps, which will result in each observation tied to its own photo record.

however, this doesnā€™t negate the usefulness of your proposed functionality.

Thereā€™s also the possible case where two obs share the same photo but not the same image (eg. one only has a zoomed in and cropped view of one or more of the images in the other observation).

They are still the same photo, but not the same digital image and wonā€™t share the same checksum (as would be the case if I uploaded a .jpg to one and .png to the other of the same original photo).

So it would be nice to (also) have a way to manually indicate observations are linked together in time and space rather than only doing it indirectly by sharing a bit-exact image between them.

1 Like

This related feature request has posts on using observation fields to link observations:
https://forum.inaturalist.org/t/link-observations/1367

Still, removing metadata and compressing should produce the same file, if the exact ā€œsame mediaā€ (as per OPā€™s title) was uploadedā€¦ anyway it might be possible to calculate the checksum first, before these changes are made on the upload.

TBH I usually donā€™t use same media for different observations, or only after reframing, resizing, I understand it might happen more often for those using the app, those who are uploading mostly without editing their images, but I felt this could be an opportunity to fill two needs with one deed, ie the requests about doublons mentioned by DianaStuder here!

1 Like

should it? if the filenames are different and the timestamps of the files are different are different, are those the same file? (if you convert A into B and then A into C, is B = C?)

if you change the resizing codec or if you run it on a different hardware / software setup, do you end up with the same file?

what do you do for the images for the 200+ million existing observations?

Thanks for the pointer to that. I have a different set of use cases that are covered by the discussion there :) We have a project doing long term tracking of individuals, using sightings by many observers ā€¦ For my own obs Iā€™ve been tagging them with the identifier for each individual, but I canā€™t add tags to other peopleā€™s observations so weā€™ve been noting that in comments as the Current Best/Only way to add an immutable ā€˜curatedā€™ (as opposed to ā€˜anyone can edit for any reasonā€™) tag.

Weā€™ve talked about creating a field for that, but thereā€™s a bit more work to do behind the scenes with reconciling the identifications made by all identifiers to arrive at consensus for which individual an observation is of - and without an external reference record, fields have the ā€˜vulnerabilityā€™ of not being curated or having a change history so itā€™s a balance between the Feature of ā€˜anyone could add themā€™ and the Problem of ā€˜anyone could change them without that change being reviewedā€™.

It would be nice to have something like ā€˜per-userā€™ fields that let each user create and curate their own personal annotations for observations - though for this case a ā€˜per-projectā€™ field would be more ideal, where the field content would be curated by the project curators.

Thatā€™s actually not quite true, at least over the long term. Image compression isnā€™t necessarily deterministic, even for lossless compression methods, and even when a newer compressor version hasnā€™t improved on the amount of compression it is able to achieve for a given quality setting - and many libraries will add metadata describing the version used to do the compression etc.

So a matching checksum could be a way to flag copies of the same image, but it shouldnā€™t be the only way and there needs to be a manual way to specify a true relationship (and possibly its nature).

With my tech hat on (:

If thatā€™s the only difference, those are just links to, or aliases of, the Same File.

The devil is in the detail of what operation ā€œconvertā€ actually is. Maybe yes, maybe no, maybe maybe :)

Change codec, no. Change hardware/software, see the devil above.

The backend storage for all the images of these obs is almost certainly already doing this, even if its identifying hashes of those images arenā€™t available to the inat front end. Content addressable filesystems and deduplication isnā€™t something new.

Iā€™m not saying thatā€™s the answer which works best for our problem, but retrofitting something like it is not an intractable problem. 200 million images isnā€™t a ā€˜largeā€™ dataset in the space of Big Data anymore :)

if it is, it would be doing it on the converted files, right? my earlier point here was that you likely couldnā€™t base your hash on the original (pre-conversion) files, as suggested by s-e.

ā€¦ regardless, iā€™m thinking this whole checksum tangent probably should stay in its own separate feature request.

Yes, on the content that is stored.

Oh, I missed that distinction, sorry - there are so many reasons it would make no sense to base a ā€˜checksumā€™ or other deterministic identifier on some transient state that couldnā€™t be reproduced to later verify it that I missed you meant the impossibility of doing it to existing observations and not the amount of work needed to apply it to 200+ million of them.