iNaturalist API with Python : get family name from observations?

sylvainm_53 · July 21, 2024, 6:39pm

Hi everyone,
I’m not very good at Python development, but I have to admit that with the help of today’s conversational engines, it’s pretty easy to get things done.
For a specific need, I need to export a list of my observations according to certain criteria, in XLSX format.
The code below works perfectly, but I’d like to add a ‘Family’ column (to sort the butterflies by family), and I haven’t managed to do so (I should rather say, ‘the conversational engine hasn’t managed to do so).
Do you think this is possible?

import requests
import pandas as pd

# Function to get taxon information by its ID
def get_taxon_info(taxon_id):
    url = f"https://api.inaturalist.org/v1/taxa/{taxon_id}"
    response = requests.get(url)
    if response.status_code != 200:
        raise Exception(f"Error fetching taxon data: {response.status_code}")
    return response.json()

# Define parameters
observer = "sylvainm_53"
taxon_id = 47157  # Taxon ID for Lepidoptera
start_date = "2024-07-19"
end_date = "2024-07-20"

# URL of iNaturalist API
url = "https://api.inaturalist.org/v1/observations"

# Query parameters
params = {
    "user_id": observer,
    "taxon_id": taxon_id,
    "d1": start_date,
    "d2": end_date,
    "per_page": 200,  # Maximum number of observations per page
    "page": 1
}

# Make the request and get the data
response = requests.get(url, params=params)
data = response.json()

# Check if the request was successful
if response.status_code != 200:
    raise Exception(f"Error in request: {response.status_code}")

# Initialize a list to store observations
observations = []

# Iterate through the pages of results
while True:
    for result in data["results"]:
        # Extract taxon ID
        taxon_id = result.get("taxon", {}).get("id")
        family_name = None
        
        # Get taxon information
        if taxon_id:
            taxon_info = get_taxon_info(taxon_id)
            # Search for family among ancestors
            ancestors = taxon_info.get("taxon", {}).get("ancestors", [])
            for ancestor in ancestors:
                if ancestor.get("rank") == "family":
                    family_name = ancestor.get("name")
                    break

        # Prepare the observation record
        observation = {
            "id": result["id"],
            "species_guess": result["species_guess"],
            "observed_on": result["observed_on"],
            "place_guess": result["place_guess"],
            "latitude": result["geojson"]["coordinates"][1] if result.get("geojson") else None,
            "longitude": result["geojson"]["coordinates"][0] if result.get("geojson") else None,
            "user_login": result["user"]["login"],
            "taxon_name": result["taxon"]["name"] if result.get("taxon") else None,
            "family_name": family_name
        }
        observations.append(observation)
    
    # Check if there is another page
    if data["total_results"] > params["page"] * params["per_page"]:
        params["page"] += 1
        response = requests.get(url, params=params)
        data = response.json()
    else:
        break

# Convert the observations to a pandas DataFrame
df = pd.DataFrame(observations)

# Export the observations to an Excel file
excel_file = "observations_inaturalist.xlsx"
df.to_excel(excel_file, index=False, encoding="utf-8")

print(f"Observations exported to {excel_file}")

Thanks for your help!

sylvainm_53 · July 21, 2024, 6:42pm

@pisum @Quercitron I think you’re comfortable with the API and Python

jwidness · July 21, 2024, 7:28pm

It worked for me when I swapped

for
ancestors = taxon_info.get("results", {})[0].get("ancestors", [])
(and also added a time.sleep() into get_taxon_info, and also removed the encoding from to_excel)

pisum · July 21, 2024, 7:49pm

this is my relevant code from https://github.com/jumear/stirpy/blob/main/content/iNat_APIv1_get_observations.ipynb (which can also be viewed as a web-based Jupyter workbook from https://jumear.github.io/stirpy/lab/index.html?path=iNat_APIv1_get_observations.ipynb):

def add_obs_taxon_ancestors(r):
    """Intended to be used as pre-parse function in parse_results when parsing observations.
    The observation taxon itself has an ancestor list but no detailed ancestor information; however, the taxon fields in the identiifcations do have ancestor details.
    So this adds ancestor details to the observation taxon, based on the ancestor details in the identifications (since the observation taxon should always be included in the indentification taxa or their ancestors).
    """
    ancestors = []
    rank_level_kingdom = 70 # this is the highest-level taxon stored in identification[i].ancestors
    if (rt := r.get('taxon')) and (taxon_id := rt.get('id')) is not None and (rank_level := rt.get('rank_level')) < rank_level_kingdom:
        for id in r.get('identifications',[]):
            if (idt := id.get('taxon')):
                if idt['id'] == taxon_id:
                    ancestors = list(idt['ancestors'])
                    break
                if (idta := idt['ancestors']):
                    for i, atid in enumerate([a['id'] for a in idta]):
                        if atid == taxon_id:
                            ancestors = idta[0:i] # add everything above this taxon (will add this taxon later below)
                            break
                if ancestors:
                    break
    if rt and rank_level <= rank_level_kingdom:
        ancestors.append(rt.copy())
        rt['ancestors'] = ancestors

when you run this for each observation in your result set, this function effectively adds detailed ancestors information from identifications[i].taxon.ancestors to the existing taxon object.

this eliminates the need to make all the extra API requests to /v1/taxon/{id} (which you’re making from get_taxon_info).

so then after that, i didn’t check the code below, but getting ancestor should be something like:

ancestors = result.get["taxon"],{}).get("ancestors",[])

… and getting the family name should be something like:

family_name = ancestor[0]['name'] for ancestor in result.get["taxon"],{}).get("ancestors",[]) if (ancestor["rank"]=='family')

sylvainm_53 · July 21, 2024, 8:59pm

Thank you @jwidness and @pisum
Bravo: you’re stronger than the robots! (and than me too, but that’s no badge of honour).

I’ve managed to get your adaptation up and running, @jwidness

What is the rule here? How long does time.sleep() last?

Thank you very much @pisum
I had read your message about this, but I have to admit that I don’t really understand how these Jupyter workbook work (it’s linked to my low level of development, and also to my level of English (I’m a French speaker).

I was unable to execute these codes.
But even without executing them, there are many very enriching examples that I should try to understand

jwidness · July 21, 2024, 9:17pm

time.sleep() takes seconds, so time.sleep(1) sleeps for 1 second. Since the API docs recommend no more than 60 calls per minute, I put in a 1 second sleep.

If you intend to do this again in the future or want to run it on a larger dataset, pisum’s code is definitely more efficient and would be worth implementing.

pisum · July 22, 2024, 11:51am

replace your get_taxon_info function with my add_obs_taxon_ancestors function
replace your Extract taxon ID and Get taxon information sections with add_obs_taxon_ancestors(result)
in your observation data dictionary defintion, set "family_name": ancestor[0].get("name") for ancestor in result.get["taxon"],{}).get("ancestors",[]) if (ancestor["rank"]=="family") (but check my code here, since i haven’t actually run it)

Quercitron · July 22, 2024, 7:32pm

@sylvainm_53, if you would like to write Python code to perform other tasks with observations, you could consider incorporating Python classes to perform some of these tasks. For example, create an Observation class to represent each observation, and a SetOfObservations class to represent a set of observations that you have downloaded. Those classes could include methods that perform common tasks with the data, such as to write HTML code to display the data, do some custom calculations, or perform other tasks that need to be done on a regular basis.

The example below creates an HTML file that can display downloaded observations in a browser. The Observation ID column in the table contains clickable links to the observation pages.

import requests
import time

class Observation:
    # class to represent an observation
    def __init__(self, id, species_guess, observed_on, place_guess, latitude, longitude, user_login, taxon_name, family_name):
        # This method initializes an instance of an Observation object
        self.id = id
        self.species_guess = species_guess
        self.observed_on = observed_on
        self.place_guess = place_guess
        self.latitude = latitude
        self.longitude = longitude
        self.user_login = user_login
        self.taxon_name = taxon_name
        self.family_name = family_name
    def to_html_tr(self):
        # This method returns HTML for a table row for an Observation object
        return f"<tr><td><a href=\"https://www.inaturalist.org/observations/{self.id}\">{self.id}</a></td><td>{self.species_guess}</td><td>{self.observed_on}</td><td>{self.place_guess}</td><td>{self.latitude}</td><td>{self.longitude}</td><td>{self.user_login}</td><td>{self.taxon_name}</td><td>{self.family_name}</td></tr>"

class SetOfObservations:
    # class to represent a set of observations
    def __init__(self, observations):
        # This method initializes an instance of a SetOfObservations object
        self.observations = observations
    def make_page(self):
        # This method returns HTML for a page of Observation objects
        html_lines = "<html>\n"
        html_lines += "<head>\n"
        html_lines += "<style>\n"
        html_lines += "html, body {\n"
        html_lines += "  font-family: \"Verdana\";\n"
        html_lines += "}\n"
        html_lines += "td, th {\n"
        html_lines += "  border:1px solid black;\n"
        html_lines += "}\n"
        html_lines += "</style>\n"
        html_lines += "</head>\n"
        html_lines += "<body>\n"
        html_lines += "<table>\n"
        html_lines += """<tr>
    <th>Observation ID</th>
    <th>Species Guess</th>
    <th>Observed On</th>
    <th>Place Guess</th>
    <th>Latitude</th>
    <th>Longitude</th>
    <th>User Login</th>
    <th>Taxon Name</th>
    <th>Family Name</th>
    </tr>\n"""
        for obs in self.observations:
            obs = Observation(
                obs["id"],
                obs["species_guess"],
                obs["observed_on"],
                obs["place_guess"],
                obs["latitude"],
                obs["longitude"],
                obs["user_login"],
                obs["taxon_name"],
                obs["family_name"])
            html_lines += obs.to_html_tr() + "\n"
        html_lines += "</table>\n"
        html_lines += "</body>\n"
        html_lines += "</html>\n"
        return html_lines

# Function to get taxon information by its ID
def get_taxon_info(taxon_id):
    url = f"https://api.inaturalist.org/v1/taxa/{taxon_id}"
    response = requests.get(url)
    if response.status_code != 200:
        raise Exception(f"Error fetching taxon data: {response.status_code}")
    time.sleep(1)
    return response.json()

# Define parameters
observer = "sylvainm_53"
taxon_id = 47157  # Taxon ID for Lepidoptera
start_date = "2024-06-01"
end_date = "2024-06-30"

# URL of iNaturalist API
url = "https://api.inaturalist.org/v1/observations"

# Query parameters
params = {
    "user_id": observer,
    "taxon_id": taxon_id,
    "d1": start_date,
    "d2": end_date,
    "per_page": 200,  # Maximum number of observations per page
    "page": 1
}

# Make the request and get the data
response = requests.get(url, params=params)
data = response.json()

# Check if the request was successful
if response.status_code != 200:
    raise Exception(f"Error in request: {response.status_code}")

# Initialize a list to store observations
observations = []

# Iterate through the pages of results
while True:
    for result in data["results"]:
        # Extract taxon ID
        taxon_id = result.get("taxon", {}).get("id")
        family_name = None
        
        # Get taxon information
        if taxon_id:
            taxon_info = get_taxon_info(taxon_id)
            # Search for family among ancestors
            # ancestors = taxon_info.get("taxon", {}).get("ancestors", [])
            ancestors = taxon_info.get("results", {})[0].get("ancestors", [])
            for ancestor in ancestors:
                if ancestor.get("rank") == "family":
                    family_name = ancestor.get("name")
                    break

        # Prepare the observation record
        observation = {
            "id": result["id"],
            "species_guess": result["species_guess"],
            "observed_on": result["observed_on"],
            "place_guess": result["place_guess"],
            "latitude": result["geojson"]["coordinates"][1] if result.get("geojson") else None,
            "longitude": result["geojson"]["coordinates"][0] if result.get("geojson") else None,
            "user_login": result["user"]["login"],
            "taxon_name": result["taxon"]["name"] if result.get("taxon") else None,
            "family_name": family_name
        }
        observations.append(observation)
    
    # Check if there is another page
    if data["total_results"] > params["page"] * params["per_page"]:
        params["page"] += 1
        response = requests.get(url, params=params)
        data = response.json()
    else:
        break

obs_collection = SetOfObservations(observations)
output_file = open("observations.html", "w")
output_file.write(obs_collection.make_page())
output_file.close()
print("file observations.html created")

jcook · July 22, 2024, 9:59pm

For what it’s worth, pyinaturalist has model objects built in, and handles pagination, caching, rate-limiting, common type conversions, etc. And pyinaturalist-convert can help with exporting to other formats, including xlsx.

Here’s a slightly simpler (but still not ideal) example of getting family names:

from pyinaturalist import iNatClient

client = iNatClient()
observations = client.observations.search(
    user_id="username",
    taxon_name="Danaus plexippus",
    d1='2020-01-01',
    d2='2024-01-01',
).all()

for o in observations:
    client.taxa.populate(o.taxon)  # Add missing taxonomy info (uses cache when possible)
    family_name = next((t.name for t in o.taxon.ancestors if t.rank == "family"), None)
    print(f'{o.id}: {family_name}')

pisum’s example of adding ancestors from identification data will do it in fewer API requests, though. I’d be open to adding similar behavior to pyinaturalist if there’s any interest.

pisum · July 22, 2024, 10:08pm

i think that would be useful. it seems to be a thing that’s been requested more than once (although not necessarily for Python specifically):

system · September 20, 2024, 10:09pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Ways to get family name from API get observation result list General api	4	418	January 3, 2024
Is there a better way to make API requests for observations ancestor taxa General	6	522	September 11, 2021
Is it possible to get order/family of an observation from the iNat API? General api	2	399	June 1, 2023
API - would like to get Observations of Lepidoptera General question , api	7	660	October 21, 2021
Widget to show observations from a list on a website General	2	763	December 6, 2021

iNaturalist API with Python : get family name from observations?

Related topics