I am tracking the species I have observed, against a checklist of species found on iNat for a particular place. The checklist is a csv file downloaded from iNat and I want to periodically check my own observations, all stored in a project, and update the checklist with any new species I observed. Specifically, I’d highlight them or add them if not already there, allowing me to find and focus on the unobserved ones.
I would like to write code in python, for example in a Jupyter notebook.
Can someone show me how to download this data from a python script, or otherwise so long as it can be done at the press of a button and then uploaded into / opened with the script?
Unfortunately I wont be able to help you with the download of that data, maybe it is feasible via the API, but I’m not aware of a way to only get a species list via that approach.
Are you however aware of the “life list” feature on iNaturalist? I have the impression that this feature might already give you the comparing result you are looking for (but I might be wrong).
I am aware of my life list, but it is not sufficient:
It is not enough that the species be observed by me, but that they come from a specific place or a specific project (which is limited to that place).
The data should not be downloaded manually (this is a pain, since I can automate the updating of the csv, so I want to automate the extraction of the data needed to do that too), but via programming interface. It does not specifically need to be programming, but I want to do it without finding the list on iNat every time and downloading it, and I think that means programming.
I also don’t know how to use the or any API, so any help on that front is appreciated. @pisum offered to make some tutorials a way back (January), but that thread is closed now. I would in fact put this there, if it were open.
If I understood correctly and you want to fetch a list of all species you’ve observed in a project, you can do that with the following python3 (requires the requests library to be installed):
import time
import requests
def fetch_all_results(api_url, delay=1.0, ttl=(60 * 60 * 24)):
"""
Fetch all results from an iNaturalist API endpoint, respecting the 1s rate limit,
and with a configurable TTL for cacheable results. Requires the requests library
to be installed.
"""
total_results = None
results = []
page = 1
while (total_results is None) or len(results) < total_results:
curr_url = f"{api_url}&page={page}&ttl={ttl}"
jresp = requests.get(curr_url).json()
if total_results is None:
total_results = jresp['total_results']
results.extend(jresp['results'])
page += 1
time.sleep(delay)
return results
if __name__ == '__main__':
results = fetch_all_results('https://api.inaturalist.org/v1/observations/species_counts?project_id=your-project-id&user_id=your-user-id&hrank=species')
results will then be a list of dict objects containing information about each species.
This is the interactive documentation for the relevant API endpoint, where you can configure it and try it out until you get the results you want, then use the final URL in your code:
This sounds easy enough, but I think we’ll need some more details on this checklist. What format is it in, and how are you currently updating it? A link or example would be helpful.
Besides those missing pieces, here’s a quick example using pyinaturalist, and assuming a CSV format of <taxon name>,<taxon id>:
"""Example of comparing a species checklist against user-observed species"""
import csv
from pyinaturalist import get_observation_species_counts, get_taxa
# Replace with your own info
CHECKLIST_FILE = 'checklist.csv'
PLACE_ID = 24 # Iowa, US
PROJECT_ID = 71609 # Biodiversity of Iowa project
USER_ID = 'jkcook'
# Get taxon IDs from checklist (this example assumes <name>,<id> format)
with open(CHECKLIST_FILE) as f:
reader = csv.reader(f)
checklist_taxon_ids = [int(row[1]) for row in reader]
# Get all taxa observed by a given user + place (or project)
response = get_observation_species_counts(
user_id=USER_ID,
place_id=PLACE_ID,
# project_id=PROJECT_ID, # Or search by project instead of place
page='all',
)
user_taxa = {result['taxon']['id']: result['taxon']['name'] for result in response['results']}
# Get user-observed taxon IDs that aren't in the checklist
new_taxon_ids = set(user_taxa.keys()) - set(checklist_taxon_ids)
print(new_taxon_ids)
# Update checklist with new taxa
with open(CHECKLIST_FILE, 'a') as f:
writer = csv.writer(f)
for taxon_id in new_taxon_ids:
taxon_name = user_taxa[taxon_id]
writer.writerow([taxon_name, taxon_id])
# TODO: Upload checklist file?
I’ll answer you better once I try this, but basically the checklist is this excel table, where there are multiple headers for the various ranks in the taxonomy. I have not updated it but I would open it in jupyter, and then for each line look at the taxon and find the spot where my new species should go then insert a line under it using the same headers- kingdom, phylum… species. I could try either updating the checklist as a text file by writing lines, or a data table by inserting rows and setting all of the values as fitting for the new species. Thanks for your response.