Add taxon_id filter to /v1/taxa/autocomplete

As most recently discussed in https://forum.inaturalist.org/t/prefix-matches-on-snow-better-match-than-the-aou-code-snow-for-snowy-owl/7061/14 , the /v1/taxa/autocomplete interface appears to be better than /v1/taxa at more reliably matching from the terms typed what the “best” result is.

If the v1/taxa/autocomplete interface had a taxon_id filter, just as /v1/taxa supports, it would allow users to be able to search subtrees of the iNat Taxonomy and get more accurate results. The filter would weed out irrelevant results, while the superior scoring system from autocomplete would help them zero in quicker on the expected match. That is, ideally, this would make their expected match the topmost match, or if not, then close to the top.

The same outcome can’t be practically achieved with either interface due to considerations I elaborated on in the discussion linked above. On the one hand, /v1/taxa/autocomplete doesn’t return enough results (maximum 30) to cover the case where 30 or more records match the terms, but none of them are in the desired Taxonomy subtree. On the other hand, the /v1/taxa results don’t contain a complete enough set of fields for downstream code to be able to impose a better ranking system on the results that more closely matches users’ implicit expectations (i.e. that the results should be similar to what they get with the /v1/taxa/autocomplete interface, which I use for all other calls except taxon_id, since autocomplete doesn’t support it).

wouldn’t it be better to ask for an option to sort by best match or most observations or whatever on the other endpoint?

I’m only following up on @pleary’s suggestion that I ask for more filters on /v1/taxa/autocomplete, so maybe they could jump in here with their opinion on this alternative. I cannot comment on which one would be more work and/or more consistent with the goals & design of the two different endpoints.

If the end result of supporting a sort_by or sth. similar on /v1/taxa is that the score could be made the same as autocomplete, and matched_term from this call is therefore consistent with the autocomplete end point, then yes, I’d be satisfied with that. Should I file a new feature request (possibly superseding this one)? Or could this one simply be retitled/reworded to request that feature instead?

Ben

I think having taxon_id as a parameter for the autocomplete endpoint is a fine addition, and I’ve just added it along with a few other parameters, rank, rank_level and all_names which returns all taxon names in the response. I do not know the requirements if what you’re building, so I don’t have an opinion on whether this parameter is what you need.

Please keep in mind that we still cannot guarantee for either the search or autocomplete endpoints that matched_term will be the term you are expecting. The logic for choosing the matched term is the same for each endpoint, but the queries we use are different, so it’s up to Elasticsearch what the best match is. But maybe with the addition of a way to get all names you can pick the match you feel is best.

If you’d like to see more things added to the API, it would be helpful to have separate posts for each request.

2 Likes

Thanks. I’m eager to try out the changes and see if Elasticsearch gives results that are consistent both with and without the new filters. I’m not looking for guarantees so much as consistency. If users want something special (like a lookaside at a table of names imported from AOU’s list) I could do that, too, but I’m hoping this will make that unnecessary. Oh my goodness! I didn’t notice all_names on my first read. If that’s what it sounds like, that would make it trivial to rescore matching AOU codes if necessary. Thanks so much!

After coding the changes in my bot to use rank & taxon_id in the request, my problem is now solved. Thanks again.

3 posts were split to a new topic: Issues with all_names on /v1/taxa/autocomplete