API full-text search of body tags in comments & identifications and return observations

Platform(s):
API (observations search)

Description of need:
I couldn’t find a way to use the API to search for observations’ comments (provided by any users, not just the one who created the observation).
https://api.inaturalist.org/v1/docs

Feature request details:
I am talking about full text search of users’ “explanations”. Meaning those text "body" tags you can see next to any of the identifications, and also as separate comments.

I am particularly interested in searching for users tagging other @users, as in the example below. I think those (identification-body & comment-body) are the usual places for tagging other users.
But searching those bodies would be useful in many different situations.
A couple of examples, followed by an illustrative API json content sample:

  • Many users working together in a project (not necessarily a iNaturalist project), which need to get a list of messages tagging any of them. API would permit to track unanswered mentions.
  • Searching for observations which might be useful for given projects (i.e., somebody tracking for albinisms could be interested in finding words like “white”, “whitish” … in comments or identifications)
[
  {"id":"12345678",
  "identifications": [
    {"user":{"login":"user1","created_at": "2022-04-16T10:10:00"},
     "taxon_id":987654,
     "body":"early flowering and a bit whitish petals @user2 what do you think?",
    },
    {"user":{"login":"user3","created_at": "2022-04-18T20:20:00"},
     "taxon_id":987655,
     "body":"petals also shorter than usual; might be the other subspecies",
    },
  ],
  "comments":[
    {"user":{"login":"user3","created_at": "2022-04-17T15:15:00"},
     "body":"@user1 I think user2 is busy this month; maybe @user4 can answer?",
    },
  ]
  },
]

This is possible already isn´t it(?) …unless I´m misunderstanding.

At least… I have harvested my own and other user comments from observations before without issue. I pulled in the relevant observations then searched the comments within them :

For example, if I run this Notebook to include this observation, I can see Martina´s comment :

Thanks @sbushes but you misunderstood my request: that’s not what I am asking.

I want to find observations from their comments, not getting comments from an observation.

Following your example, Martina’s comment body says “it is probably P. formosum”.
What I want is to be able to find this observation by searching for text in comments’ bodies all across iNaturalist observations (i.e., searching for “formosum”, should return this and other observations containing that word in any comment’s body).

Similarly, searching for either “user1”, “user2”, “user4” or “whitish” should return the observation in my example above (because they all are words present in at least one comment body).
But not “user3”, because it’s not mentioned in any comments (although that user has made a comment).

1 Like

Ahh right, I see. There are over 100,000,000 observations (and even more individual comments)…so to scan across all of them like this would be heavy/time-consuming wouldn’t it? …at least in comparison with running a search under an already delineated part of the structure…
But ok… in any case, yes… I misunderstood.

Yes indeed.
My main current interest in this feature would be tracking comments which may be relevant for a project where we have involved some specialists which are not frequent iNaturalist users (although they have user accounts, they only use to login if I tell them there is an interesting observation or comment to see).
So I want to use the api to do the hard work of tracking comments and creating a weekly report of observations which are worth for them to review.

The problem is that I don’t see a proper way to do this other than asking for a new feature to search inside comments & identifications.

The other only option would be to parse EVERY single observation and look comments inside.
This would take ages and be an insane waste of bandwith and CPU hours.
But even if I do it, once finished (if ever) I would need to start it over again … because a new comment or identificacion could be added at any time to any observation. So this would be an infinite and crazy loop.

On the other hand, if the feature is available I could just launch a daily query which would probably last just a few seconds.

1 Like

How many observations do you expect to collect in your project?

why wouldn’t you want to use an iNaturalist project here?

@pisum:
How many observations do you expect to collect in your project?

It’s the opposite thing: we don’t want to collect observations, but to use iNaturalist’s current observations to search for certain words (in their comments or identifications) which might be relevant to a particular research interest.

why wouldn’t you want to use an iNaturalist project here?

I don’t think an iNaturalist project is very useful for this. Except for the case of putting together all the observations were we have found comments which make them relevant to the project.

But we still need to use the api to find them, and then store their observation ids (and a local csv would be just fine; we don’t need to create a project for this).
We can of course create a traditional project and dump all collected observations into it, but that would be a further unnecessary step: more work, not less -at least for our use case-.

i don’t really understand your anticipated workflow.

presumably if you have people working together on something, they have shared expertise. it seems like if they have shared expertise, they would be looking at particular taxa and/or maybe particular places, not just any observation.

so to me, it seems like would make the most sense to look for /v1/observations?updated_since=[last time you ran your report]&taxon_id=[taxa of interest]&place_id=[places_of_interest].

for most taxa, that shouldn’t be a huge number of records, if you’re doing this weekly. so then you can just parse through the entire set of observations to look at not just identification notes and comments, but also observation descriptions, tags, etc…

if you’re going to be tracking responses, i guess you could do this in some sort of custom tracking system, but me to me, it makes the most sense to do this in a project, maybe leveraging iNaturalist’s observation fields.

It’s not an API call and it only works with standalone comments, not ID comments, but iNat does provide a URL parameter that might help, e.g.

https://www.inaturalist.org/comments?utf8=✓&q=formosum&commit=Search

Those searches are slow, which makes me think iNat really is trawling through all the comments. And that makes me think that you probably shouldn’t try to build that into any automated process.

Thanks @rupertclayton
Yes, I knew that way to search comments. But as you say it’s not very good idea to use that into an automated process.

That’s the reason I am requesting an API option for doing it. Plus I could combine such option with other API filtering methods, which could be very convenient.

Thanks for your suggestions @pisum . This one could be very useful, :

?updated_since=[last time you ran your report]

For the others, you are presuming things that are not true: no common taxa, no common place. Not even shared taxonomic expertise (are you presuming they are taxonomists? I didn’t say that).

As I said we are just looking for words in comments about any taxa (in certain languages -that’s their expertise-, but once we got the comments it’s very easy to filter those languages of our interest).
It’s just a matter of taking profit of iNaturalist collateral data (comments) for something which is pretty unrelatd to iNaturalist common use cases.

But as I said, such a feature could be very useful for other common use cases, like searching for users’ mentions as I exposed in the above example (which applies as well to my use case, but also to many other possible scenarios).

i don’t think this actually would be a common use case. it’s already possible to get your own mentions from the existing notifications functionality. searching for mentions of other people is probably not something most people would do, and there may actually be reason to prevent that kind of functionality for privacy reasons.

you could search for certain words in observation tags and descriptions using ?q=.

I don’t understand what you mean.

What’s private in a public comment in a community tool like iNaturalist?
If comments were supposed to be private, then they would be private.

As for it being a common use case, of course it is not.
We are talking about using the API, which is uncommon by itself.

But in project scenarios where you want to compile observations using the API, doing the hard work of submitting them to your own team of experts might be not uncommon.

There may be many museum staff willing to collaborate in citizen science projects. But I bet the vast majority of them are not going to do it if that implies having to log into whatever, learning a new website interface, installing stuff into their mobile phones, and things like that.

On the other hand, if I email them a weekly worksheet report of already filtered and tabulated observations, they would certainly give me feedback introducing their identifications and mailing them back to me.

In that scenario, my daily work of tagging experts in observations’ comments would be a very productive way to keep things ready to be filtered and tabulated during the weekend.

in your museum staff scenario, you could establish an organizational account for the museum, and then use the existing notifications functionality via v1/observations/updates to get mentions of the museum account in identifications and comments. you can distribute that list of results to whomever you choose.

in your example, since many people don’t want to adopt new platforms anyway, you’ll need an organizational account to potentially respond to other users’ comments / identifications anyway.

Or we can use the API and let users keep their own opinions, which may be different between members of the same institution (i.e. one can be submitting family level identifications, others may be reviewing a particular genus … and so on).

A single API-based application can perfectly be used to submit identifications from different accounts, as far as the user trusts the application (i.e., a Python script which processes a spreadsheet previously filled in by the user).

This way, different members of the same institution can submit lots of identifications to iNaturlist without loosing their time logging web/mobile interfaces, searching for observations, comments, mentions or whatever… if there is somebody else (human or robot) collecting them into a spreadsheet.

That’s the point.
IMHO, one of the many advantages of APIs is to permit advanced/personalized usages.