Handling matches on name fields not returned in API v1 /taxa response

I’ve written a little taxon lookup command for the iNat unofficial Discord & I’m trying to figure out what to do with names that are matched in the API /v1/taxa?q= that aren’t actually present in the results. Here’s an issue on the code I’m writing concerning my problem:

https://github.com/synrg/quaggagriff/issues/16

To sum that up, I sometimes get back results that may contain a matched_term value that bears little relation to whatever was in the query. If you go to the web page and click the Taxonomy tab you can find the names that matched (but I don’t want the overhead of displaying the whole web page and the tedious job of scraping those names off the page, so I’m not going to do that).

  1. If I simply don’t tell the user what matched and show them that result anyway, it is going to look broken.

  2. I don’t want to just discard the result, either, as it might be what they actually know the taxon by (i.e. a name other than the “preferred common name”).

  3. Ideally, I’d like to tell the user the name that their query matched so that the response is intelligible to them.

Since matched_term isn’t reliable (see my “common teal” example in the linked bug report above), what am I to do? I’m tempted to use the taxon_names.json endpoint the web page calls, since that would be a heck of a lot easier than scraping the whole web page! Is this kosher? It’s not in a published API, so I understand there’s a risk it might change tomorrow and then my code would no longer work.

Thanks,
Ben

use the autocomplete endpoint instead?

Could you please provide the URL of an API call that’s an example of the matched_term being unreliable?

Any undocumented API could be changed or removed at any time, so I wouldn’t recommend using them.

https://api.inaturalist.org/v1/taxa?q=common+teal&rank=subspecies

There are two results. Neither result has a matched_term that contains the word “common”.

I looked at the autocomplete endpoint. It suffers from the same problem and doesn’t support as many filters, which makes it less attractive for my purposes, besides. My user audience is experienced and will find it helpful to filter on rank, descendants of a specified ancestor, etc.

if i’m interpreting your problem correctly, i don’t think it suffers the problem you’re talking about. see https://api.inaturalist.org/v1/taxa/autocomplete?q=common%20teal. however, if you do something like this, then there is a problem: https://api.inaturalist.org/v1/taxa/autocomplete?q=“common%20teal”.

that said, if you’re trying to filter by rank, then, yes, unless you apply your own filter to the results of the autocomplete endpoint, you’re stuck with the taxa endpoint, and it doesn’t appear to give you back a matched_term that matches the term you gave it.

It suffers from the same problem in that the two records that exactly match my query within the autocomplete results (i.e. rank: subspecies) also have matched_term values that do not have the word “common” in them. That is all I meant.

maybe i’m still not understanding, but below is a snippet of the autocomplete results. note the “common teal” at the end that i bolded.

{
“observations_count”: 354,
“taxon_schemes_count”: 1,
“ancestry”: “48460/1/2/355675/3/6888/6912/6922/6937”,
“is_active”: true,
“flag_counts”: {
“unresolved”: 0,
“resolved”: 1
},
“wikipedia_url”: “http://en.wikipedia.org/wiki/Eurasian_teal”,
“current_synonymous_taxon_ids”: null,
“iconic_taxon_id”: 3,
“rank_level”: 5,
“taxon_changes_count”: 0,
“atlas_id”: null,
“complete_species_count”: null,
“parent_id”: 6937,
“name”: “Anas crecca crecca”,
“rank”: “subspecies”,
“extinct”: false,
“id”: 132873,
“default_photo”: {
“square_url”: “https://static.inaturalist.org/photos/176803/square.jpg?1545397517”,
“attribution”: “© Len Blumin, some rights reserved (CC BY-NC-ND)”,
“flags”: ,
“medium_url”: “https://static.inaturalist.org/photos/176803/medium.jpg?1545397517”,
“id”: 176803,
“license_code”: “cc-by-nc-nd”,
“original_dimensions”: {
“width”: 800,
“height”: 640
},
“url”: “https://static.inaturalist.org/photos/176803/square.jpg?1545397517
},
“ancestor_ids”: [
48460,
1,
2,
355675,
3,
6888,
6912,
6922,
6937,
132873
],
“matched_term”: “Common Teal”,
“iconic_taxon_name”: “Aves”,
“preferred_common_name”: “Eurasian Green-winged Teal”
}

EDIT: oh… but maybe you need the Aleutian Teal to also show as “Common Teal” in the matched_term?

The matched term we were using was a match, but not necessarily the best match. In the example you linked to the matched term contained Teal, but there was a better match that contained both Common and Teal. I made a change that will hopefully address that if you’d like to try again.

In the future, it’s always helpful to provide URLs with requests like this and to clearly articulate the problem. When you said the matched term bears little relation and isn't reliable it sounded like it wasn’t a match at all, when the problem was it wasn’t the best match.

Thanks for reporting.

2 Likes

My mistake. You are correct, it does work in autocomplete, but I was so overwhelmed with the volume of irrelevant results I guess it was easy to overlook.

I tried again with your fix. That did it! Thanks so much:

image

Regarding coming to the point and toning down the frustration in my reports, yeah, that was poor form of me & I will endeavour to work on that in future reports.

@pleary – should the matched_term = “Aleutian Common Teal” for the Aleutian Teal in the autocomplete results to match the new behavior in the taxa results?

see https://api.inaturalist.org/v1/taxa/autocomplete?q=common+teal, where matched_term = “Aleutian Teal”

Good point about “Aleutian Common Teal”. My users would still be confused if that were one of the records shown, yet I displayed Matched: “Aleutian Teal” from the matched_term.

Looks like I initially only applied the fix to the taxa search endpoint, not the autocomplete endpoint. I’ve since applied it there too. That said, I cannot guarantee that in all cases the matched_term will contain all words in the search term. Your mileage might vary based on the search term, and what the database considers the best match.

1 Like

Thanks again. Also, now that I look at the results from autocomplete more carefully, it might be useful for queries that don’t contain any filter criteria, switching to the taxa endpoint only when the user has specified rank or parent taxon ids as filter criteria. Currently, I do some “tinkering” with ranking the results to make them look more like what was expected, but I’d like to avoid that as much as possible. (And in any case, if there’s an actual problem to investigate there, I’d need to start a new thread for that.) I’ll experiment some with that, and if I still have issues after making the switch, will get back to you with a new post.

1 Like