Search all fields (search within observation fields)

I’m a bit confused and frustrated as to how this (apparently?) isn’t possible, and moreover that the only request for it that I can find is Danny’s post from last month, relatably treating this as a bug report.

I understand this is technically an absence of a feature, not a bug (regardless of how surprising the feature’s absence is), so I’m posting this as a request here/now.

This really is a three-word feature. Search all fields.

I’m mainly looking at mushrooms. If I want to see if someone indicated their find smelled or tasted sweet, I have to make separate searches for every one of these fields:
Smell
Odor
Odor or Taste
Taste/Odor
Odor - Macrofungi
Scent description
Scent?
Any scent?
mushroom taste
Taste - Macrofungi
Taste

If I want to search for finds with “anise” or “sweet” or “pleasant”, now that’s 33 searches, just for observation fields.

Actually, not even 33 searches would do it, because there’s no “contains” search even on a specified obs field - it seems like I’d have do infinite searches, searching separately for every combination of characters that includes “sweet”.

I also understand the programmers at iNat have little, if any, time to implement new features, but I’m hoping at least that someone can point to any existing request for this I missed, or explain why this feature is more difficult or less necessary than it looks to me, or a workaround, or sympathy, or something…

the kind of query you’re describing is relatively complex because:

  • there could be many fields attached to any given observation
  • even though your fields of interest contain strings only, observation fields generally can contain different datatypes.
  • you’re asking for a case-insensitive contains / like / find search for several possible values on any of the values in a specific set of observation fields.

even if iNat staff provided a way to do a case-insensitive contains / like / find search across all fields, you likely would still end up having to do 3 separate searches for 3 separate terms.

my understanding of observation fields is that they are sort of a better-than-nothing way of capturing custom structured data. the tradeoff for their customizability / flexibility is that you can end up with similar data scattered across many different fields with unstandardized values. querying non-standardized values from non-standardized fields can be challenging, but again, that’s the tradeoff for flexibility on the input side.

i think the closest way to get what you’re looking for is to use the API to start with as small a set of observations as possible (ideally, less than 10,000 observations), and then parse through those observations to find your search terms within the observation fields.

so for example, suppose i define my starting set as these ~9000 observations: https://jumear.github.io/stirfry/iNatAPIv1_observations?ofv_datatype=text&taxon_id=47167&verifiable=true&place_id=1&d1=2024-10-01&d2=2024-12-31T23:59:59.999

then i can go to https://jumear.github.io/stirpy/lab?path=iNat_APIv1_get_observations.ipynb and use these as the parameters for my starting set:

req_params_string = 'ofv_datatype=text&taxon_id=47167&verifiable=true&place_id=1&d1=2024-10-01&d2=2024-12-31T23:59:59.999'

… and when parse that set and filter for “sweet”, “pleasant”, or “anise” in any of the observation field values:

obs = await get_obs(req_params, get_all_pages=True, use_authorization=False, pre_parse_filter_function=(lambda x: len([ofv for ofv in x.get('ofvs',[]) if (ofvv:=ofv.get('value').lower()).find('pleasant')>=0 or ofvv.find('sweet')>=0 or ofvv.find('anise')>=0])>0))
#obs = await get_obs(req_params, get_all_pages=True, use_authorization=False, post_parse_filter_function=(lambda x: x['observation_fields'].lower().find('anise') >= 0 or x['observation_fields'].lower().find('sweet') >= 0 or x['observation_fields'].lower().find('pleasant') >= 0))
obs_ids = [o.get('id') for o in obs]
obs_id_sets = items_to_batches(obs_ids, prefix='https://www.inaturalist.org/observations/identify?id=')

… i get 35 observations:

https://www.inaturalist.org/observations/identify?id=258985744,258985300,257188171,257027122,256957417,256215021,256144639,255824198,255731697,255698962,255692849,255657474,255637857,255460276,255267841,255150830,255150284,255150219,255027221,254181347,253912192,253653470,253476198,253421337,253419626,253417755,252803500,252672242,249941699,249827976,249542370,248625884,247170602,246990267,246370748

if you don’t like my workbook above and want to use pyinaturalist to get data from the API instead, the Python code to get observation ids similar to the above would be something like:

pip install pyinaturalist
from pyinaturalist import *

response = get_observations(page='all', ofv_datatype='text', taxon_id=47167, verifiable=True, place_id=1, d1='2024-10-01', d2='2024-12-31T23:59:59.999')
obs = response['results']
#print(obs[0])

filtered_obs = [o for o in obs if len([ofv for ofv in o.get('ofvs',[]) if (ofvv:=ofv.get('value').lower()).find('pleasant')>=0 or ofvv.find('sweet')>=0 or ofvv.find('anise')>=0])>0]
print(len(filtered_obs))
print([o['id'] for o in filtered_obs])
#print(filtered_obs[0])

i know some folks will complain that this is too hard to do or that it shouldn’t be this hard to do. but believe me that it could always be harder or impossible to do. so i think it’s best to be grateful to be able to get results for such a complex request at all.

6 Likes

I had noticed this too. Thank you for writing it up.
And I’m glad there is a way to address it @pisum so thank you for that too.

I have wondered about if there would ever be a viable path for curation of observation tags - something like how wikipedia editors can lump together redundant articles or lists and so on. It adds an extra pile of work, and adds opportunity for disagreement or messing with someone else’s systems that they actually need to be distinct, but there are ways it could work.

For example if by default adding a new observation field is open to curation by others, but the first time you make a new one you are prompted with the ability to choose to make it “only I can edit”. This flags it that way so curators and moderators would know it is a personal system they shouldn’t mess with (unless it were against terms of service or something). But everything else would be fair game for someone with the time, energy and passion on to deal with it.

2 Likes

Is there a particular place you’re most interested in? It’s a lot harder if you’re looking at the entire world… because different regions choose different observation fields.

I agree with pisum that it’s much harder than it would seem to do this, especially since the observation field data itself is dynamic. I also agree with you… when there seems to be consensus for a given set of observation fields for a given taxon, I wish they’d be locked, or pulled into the core product instead of remaining user defined observation field data. Once there are so many for one thing, it’s hard to achieve consensus and normalization.

I would like to see duplicate observation fields merged but I don’t think it would be easy

there’s been previous discussion in many other threads. of particular note: https://forum.inaturalist.org/t/standardize-and-clean-up-observation-fields/363/17.

3 Likes

That is interesting thank you…
& seems to cover the origin of the GitHub hosted workaround/solution

I also just noticed this text while editing an observation field I made:

About Editing Observation Fields

You can edit this observation field because you created it or because you’re a site curator. If you’re a site curator, please follow these guidelines:

Don’t make arbitrary name changes (e.g. changing “eating” to “nectaring”)
Keep in mind that observation fields can be used for observations of any taxon, so try not to use taxon-specific terminology
Don’t delete fields that people are using
Don’t translate existing field names into different languages. English is the lingua franca of this site. If you really want a field that encapsulates the same concept but in a different language, please make a new observation field."

The annotations appear to be aware of iconic_taxa. It would be easier if the observation fields had the same freedom… it would allow them to map more easily to new annotations as they’re pulled into core.

I am really grateful for the thoughtful reply. I understand that what you were going for, and maybe achieved, is to provide both an explanation for the difficulty and a viable workaround. But I can’t help feeling like this is neither an explanation for the difficulty nor a viable workaround…

Your explanation of why it’s “relatively complex” kinda seems to validate the impression that this is a relatively straightforward ask. As you say - just look through all the fields on all the obses and see if each field contains the given string! A case-insensitive pattern matching, including OR keywords (I’d be looking for “anise OR sweet OR pleasant”) seems like an easy thing to write with regular expressions. Non-string fields are even easier to evaluate. Tags are dynamic and are already searchable like this - what’s the difference between tags and fields?

The feasibility of your workaround confirms that, doesn’t it? If this is something that can be be done straightforwardly with the API, it’s something that could be relatively straightforwardly done behind the scenes too.

I guess I am the folks complaining that “this is too hard to do or that it shouldn’t be this hard to do”. I’ve written a program for the iNat API, a command line program to copy all of a user’s entire obses from Mushroom Observer. I feel like it was a massive hurdle to get the program authenticated and running in the first place, and it would be an additional massive hurdle to make a program integrated with the browser, to somehow integrate the field searches with normal searches, and make the results clickable links.

Again I’m grateful for your reply, and you’re absolutely right it could be harder or impossible to do, and I’m incredibly grateful all the services iNat does provide for free. I guess I’m just still not convinced that, as far usability improvements go, if resources were available to make any, this wouldn’t be relatively low-hanging fruit…

putting aside the fact that my workaround is exponentially slower than running a regular observation search, just because you can work around an issue outside of the normal workflow for some use cases doesn’t mean that solving the underlying issue is easy or worthwhile. the sort of thing you’re asking for is just inherently inefficient, and it’s something that the vast majority of people would never use in your particular configuration.

if you don’t believe me, or if you don’t like my answer, that’s fine. take it or leave it, i guess. let’s see if anyone can / is willing to provide an answer that will satisfy you more.