What would be the best way to put together a spreadsheet that contains all species of Diptera in the iNat taxonomy? I’d also like to have columns in this spreadsheet indicating all the parent taxa of each species between order and species.
Like an Excel? If so, I actually adore Excel (it’s kind of my jam, yes I know this is weird) and would be happy to build this for you.
PD: can provide references.
This is already built into iNaturalist in some data tables, and my question is how to download this information most efficiently for everyone. I don’t want to do it manually for 17,000 species.
Right, the data from iNaturalist should be importable to Excel I mean. (You referenced columns, which suggested to me you would be adding data of some sort.)
edit to add: @pisum is usually very knowledgeable re: this stuff.
What does this mean?
(references for Excel expertise, as I use it professionally, unrelated field though, no worries. I think pisum is going to have the best info.)
This tool looks like it would actually be perfect if it didn’t require 100 separate downloads to be combined for each page:
https://jumear.github.io/stirfry/iNatAPIv1_taxa.html?is_active=true&taxon_id=47822&per_page=1000
Between the “parent ID” and “rank” columns, the data produced by the above tool is enough to reconstruct the whole parent taxon hierarchy using something like xlookup.
the page has an export function that will allow you to export 10000 records (the max number of records that the API will allow you to get per set of parameters). you can work around the max by speciffying order_by=id
and using id_above
or id_below
to construct n sets of up to 10000.
i would start with something like this as the base: https://jumear.github.io/stirfry/iNatAPIv1_taxa?taxon_id=47822&rank_level=10&per_page=500&order_by=id&options=ancestry
export, go to page 20, get the next set based on id_above the id of the last record on page 20, and repeat as many times as needed
…
an alternative would be to use the DWCA taxon export, but if i remember correctly, that has more taxon records than standard Excel worksheets can handle. so you either need to know how to use Power Queries in Excel to filter, or else you could use SQL, Python, R, or some other languange that can handle large datasets.
I had some trouble loading the file in SQL server desktop, but rowzero.io can handle the file just fine!
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.