How can you get the full list of Establishment Means for a taxon?

@pisum @benarmstrong I’m hoping maybe you have some ideas?

The problem is this:

For any given taxon, only up to 100 listed taxa will display on the taxon page.

I was hoping there would be a relatively straightforward way to get all of the listed taxa, but I haven’t been able to figure one out.

It looks like the page is loading a static list found in (for example) https://api.inaturalist.org/v1/taxa/46017. It only goes up to 100.

So then I tried https://www.inaturalist.org/taxa/46017.json, which didn’t have anything useful.

https://www.inaturalist.org/places.json?taxon=46017 seemed promising, but you can’t search by admin_level, only by place_type, and there are so many place types, e.g. state, department, province, county, etc.

Then I thought maybe I could skim the data from the atlas, but https://www.inaturalist.org/atlases/7.json doesn’t have the full list either.

I’m hoping I’m missing something obvious – any thoughts?

4 Likes

@benarmstrong?

I don’t know. I gave up and worked around the absence of any apparent way to do it another way (i.e. I didn’t actually need the whole list, just a way to provide some text for the link to the establishment means).

2 Likes

i think the least bad way to do it is to build off of https://www.inaturalist.org/places.json?taxon=46017.

that endpoint provides a few filters, and the one that will limit things the best initially in this case is establishment_means. we don’t care about places where the establishment means is unknown. so then you should be able to get places with establishment means (PWEM) by iterating through each of the 3 relevant establishment means and accumulating the results:

i believe the PWEM set will include places that don’t get captured in the taxon page’s establishment means section. i’m not sure what that page is doing to limit its set, but i assume it’s filtering based on some set of place_types or something like that. so then you can apply your own filter(s) on the PWEM set to exclude the places that you don’t need.

i like that you’re thinking about ways to address the problem of managing establishment means after the first 100 shown on that page. that’s been an issue for a quite a while, and never really thought too deeply myself about how to attack that problem. so if you’re making a tool to help attack the problem, that will be quite useful. (if you need any additional help with such a tool, please don’t hesitate to ask.)

Something is fishy – taxon 46017 says it has 897 listings, but it has 0 endemic, 42 native, and 897 introduced.

The list on the taxon page isn’t limited to admin level 0, 1, or 2 places since those only sum to 318 for this taxon. So the numbers seem to suggest that in this case, either only introduced listings are being counted, or there is some other filter that’s subtracting out exactly 42 listings…

1 Like

i wasn’t sure what the discrepancy there was, but maybe the count on the establishment means section of the taxon page is just wrong. so maybe no need to additionally filter – just pull back any place with establishment means.

This seems to be true – I tried for taxon 43128 and it’s also exactly the number of introduced listings, leaving out the native listings.

1 Like

Ugh, why isn’t it even consistent? This taxon has 61 native listings and 65 introduced, all are included on the taxon page.

i guess i wouldn’t worry too much about what exactly it’s doing. we know that it’s unreliable at least in some cases. if i had to guess, i would just guess that when it has to return pages above 1 (more than 200), the count is wrong.

I see how that helps, and maybe that’s all jwidness needs, but if you wanted to look up the listed_taxa entries themselves, I’m not seeing that in the result. The checklist ids are given, but no listed_taxa id#.

It’s true you don’t get the listed_taxa number, but you can construct a URL that would get you one click away. For example, from the results for 46017, you can take a check_list_id and create https://www.inaturalist.org/check_lists/307?taxon=46017

1 Like

Aha! That hadn’t occurred to me. In a pinch, that’d do (but ugh, web scraping).

1 Like

oho! what’s this?

https://www.inaturalist.org/check_lists/307.json?taxon=46017

$ curl -sl "https://www.inaturalist.org/check_lists/307.json?taxon=46017" | python -m json.tool
{
    "list": {
        "id": 307,
        "title": "New Mexico Check List",
        "description": null,
        "user_id": null,
        "created_at": "2009-07-01T07:36:47.000Z",
        "updated_at": "2020-08-14T18:34:42.507Z",
        "comprehensive": false,
        "taxon_id": null,
        "last_synced_at": "2020-08-14T18:34:42.506Z",
        "place_id": 9,
        "project_id": null,
        "source_id": null,
        "show_obs_photos": true
    },
    "listed_taxa": [
        {
            "id": 27100824,
            "taxon_id": 46017,
            "list_id": 307,
            "last_observation_id": 17974439,
            "created_at": "2018-11-18T14:31:57.089Z",
            "updated_at": "2019-11-27T22:12:58.169Z",
            "place_id": 9,
            "description": null,
            "comments_count": 0,
            "user_id": 425992,
            "occurrence_status_level": null,
            "establishment_means": "introduced",
            "first_observation_id": 8721014,
            "observations_count": 4,
            "primary_listing": true,
            "taxon": {
                "id": 46017,
                "name": "Sciurus carolinensis",
                "rank": "species",
                "source_id": 139,
                "created_at": "2008-03-19T01:25:55.000Z",
                "updated_at": "2019-09-02T07:16:53.704Z",
                "iconic_taxon_id": 40151,
                "is_iconic": false,
                "name_provider": "ColNameProvider",
                "observations_count": 69492,
                "listed_taxa_count": 34274,
                "rank_level": 10.0,
                "unique_name": "eastern gray squirrel",
...

I’m a little hesitant to make a separate API call for every listed_taxa number. Maybe if I paginate with relatively few per page…

1 Like

It’s so much weirder than that. From what I can tell, the total number on the taxon page is correct only if the number of listings of introduced/endemic combined is less than 100, and the number of native listings is less than 101.

for what it’s worth, i think maybe where @benarmstrong’s suggestion leads is that instead of linking directly to the checklist+taxon page, when the link is clicked, the page could execute a function that first fetches the json and then uses it to take you to the listed taxon page.

in other words, you’d make the extra API call only on demand, which seems like a good tradeoff considering that loading the checklist+taxon page is probably a little more expensive. the only confusing thing that might have to be handled is just that you would probably need to code your page to explicitly open up the listed taxon page in a new tab, rather than allowing the option to open in the current window.

I had thought about a call on demand, but I think if I’m going to call it at all, I’d rather call it earlier so I can at least make use of the other info returned (e.g. description, comments_count, etc.)

since it didn’t look like anyone had made a tool to get the full list of establishment means, i went ahead and made something. it’s very basic for now, but if anyone has suggestions for how to improve it, please let me know or feel free to improve upon it.

page: https://jumear.github.io/stirfry/iNat_taxon_est_means.html
code: https://github.com/jumear/stirfry/blob/gh-pages/iNat_taxon_est_means.html

5 Likes

Nice start, thanks for working on this! A couple of questions:

It looks like the sort order is currently Est_means (descending), then Place_ID (ascending). To get closer to the current system sort order, I wonder if the initial default could be Admin_level (ascending), then Parent_place_ID (ascending) (or Parent_place_name?), then Place_name (ascending).

And just curious what use(s) you had in mind for the lat/long information being displayed?

1 Like

done, except that i left off parent place name for now. it’s possible to get the parent name, but it requires additional logic and additional calls to the API. so i’ll add it only if its absence really hinders usability.

i was thinking that coordinates might help someone visualize where in the world a particular place was, or that someone could use coordinates to map the data, though both are probably unlikely, i suppose…

in the latest version, i’ve added a tiny map to display a rough location, and the coordinates will only be displayed if the user adds a parameter disp_coords=true.

3 Likes