Allow download of project curator ID fields from traditional projects via the main Export Observations page

I just spent a good while searching to see if someone had already asked/addressed this and didn’t find anything, but my apologies if I missed something… My request: To have the option to download through the main Export Observations page all the fields that are available to admins/curators under the “Export with Hidden Coordinates” Project Curator Tool on traditional projects AND have the option to filter by date, taxa, etc. as we can in the Export Observations page.

I manage several traditional projects that our agency uses to assess rare species across our state. Beyond access to true coordinates, one of the most important things we get from the traditional projects is the ability to see IDs from our project curators, which gives us even more quality control beyond the Research Grade designation (we don’t use a RG observation unless it’s been vetted by a curator). We’re at a point where a few of these projects are close to or beyond the 200,000 observation download limit, and the curator tool “Export with Hidden Coordinates” does not give users the ability to filter observations; it’s just a pre-set download of the entire project. Even with small traditional projects, I’m finding that it takes a very long time to respond (typically, it takes hours of clicking through time-outs to get a download).

And while the Export Observations page allows project curators/admins to download private_latitude & private_longitude fields when a relevant project is specified, I have not found a way to include the following fields in the download:
curator_ident_taxon_id
curator_ident_taxon_name
curator_ident_user_id
curator_ident_user_login

Being able to access those fields in the download and filter by date to be able to pull smaller chunks of data would be extremely helpful, as it’s incredibly time-consuming to constantly be watching your export request spin and time out and then have to click again to start over via the traditional project export tool. Working through the Export Observations portal, users can queue a request once and just let it process. And one of the reasons we want to see exactly which curator has added what ID is so that we can go back over subsequent downloads and scan for any changes/updates to those fields. So just using a URL search for “pcid=true” doesn’t get us what we need.

Just thought I’d put this out there in case this is of interest to anyone else, and if anyone has suggestions for workarounds, I’d love to hear them!

i think these other conversations sort of cover similar ideas:

regarding workarounds, while it is possible to get individual identifications associated with an observation via the current v1 Node.js API or to get just curator IDs via the old Rails API, i don’t think the APIs are really designed to handle sets of observations that are as large as you’re talking about here (approaching 200,000 records). that said, i don’t know of any other reasonably efficient way for folks without direct access to the database to get this kind of data.

suppose you were able to get the data in the format you’re looking for. how exactly would you use that data? (for example, if a given observation had an identification by a curator that conflicted with the community ID, i assume you would take the curator’s ID. but what if two curators provided conflicting IDs?) also, why do you need to export all that data? are you saving that off somewhere just for backup, or are you feeding it into some other system or something else?

i don’t usually have a use for getting the true coordinates of obscured observations – so i haven’t really dealt with exporting these much – but i believe that if you just use the standard export page, and include private coordinates in your selected fields for export, then you should be able to filter as much as that page will allow you to filter.

EDIT: i read your original post again with rested eyes, and i think i understand it a little better. clicking “Export with Hidden Coordinates” on a traditional project page produces a CSV file with not only hidden coordinates but also extra curator ID columns. so you want those extra curator ID columns to be selectable/available in the regular export page.

i guess the standard export page would have to be able to check to make sure that you’re a curator for any selected project, although i have seen where the page changes the selectable observation fields based on project selected. so it must be possible to add that kind of logic.

i’ve never used the curator ID functionality before. so i’m still having trouble conceptualizing what kind of workflow you would have with those kinds of results. you mentioned that you want to potentially go back and scan for changes/updates to the curator IDs. but why does that matter? is it just that a particular observation gets any curator to look at it, or is more important that a curator is reidentifying a particular observation?

1 Like

We use the data from our traditional projects to feed community science data into our Texas Natural Diversity Database, which is used for a variety of uses, from research to conservation and development planning. That database is also our conduit to the NatureServe network. Our standard for observations entering this database is pretty high, which is why we don’t just pull the GBIF dataset and instead pull a subset of RG observations that have been vetted by our curators. So, yes, if we have a conflict in curator & community IDs, we’ll generally give more credence to the curator, and if we have a conflict within the curator IDs, or a subsequent curator ID change that we notice when we compare downloads in Excel, our database team will reach out for clarification.

Ideally, yes!

Somehow it must be able to match up my curator status on projects since the export page can assess when I filter by a traditional project I admin/curate that I am allowed to populate the private_latitude & private_longitude fields with true coordinates for my download within that filter:

So I think the logic is already there, just not sure how much of a hassle it would be to add in those additional fields and how much interest there is from others for this feature.

1 Like

I was looking up to see if anyone had a solution to this problem, only to find you already asked this question relating to our work. Still waiting for a work around!

i’ve been about this since it popped up again. earlier i said this:

i think i was originally thinking of a way to get you all both the private locations and curator IDs at the same time, but if you separate those tasks and then combine the data later, then i think it’s more reasonable to get the curator IDs only through the API.

so just for example, the Plants of Texas project has 3 million observations, which you probably wouldn’t want to attempt to get from the API, but the subset of observations identified by project curators is only 9700 observations, and there should be no technical issue getting that amount of observations from the API via /v1/observations.

just for example, here’s a page that takes the API data for curator-identified observations in the Plants of Texas database and shows how the various curators identified each observation (relative to the observation taxon): https://jumear.github.io/stirfry/iNatAPIv1_observations?per_page=200&pcid=true&project_id=plants-of-texas&options=idextra&idextra_user_id=mikaelb,billdodd,marcopperman,cullen.

this probably doesn’t get you data that you could use directly for your purposes, but it just shows that the data is there in the API, and you would just need to write a script or program to get the data into a format that you can use.

then you would join that set of records (by observation ID) with the full set of Plants of Texas records (which includes true locations of obscured observations), and then you would have basically the set of observation data that i think is proposed / described in the original post.

1 Like

i wrote something the other day that can extract observation data from the API and writes out a CSV: https://jumear.github.io/stirpy/lab?path=iNat_APIv1_get_observations.ipynb. it could be adapted to get the kind of data that i was describing in my most recent post above: if you’re interested in using it and can’t figure out how to adapt it for you purposes, just let me know, and i can provide some suggestions.