iNaturalist API : retrieve new or modified observations since a date

Hello everyone,

I have several personal and community projects for which I am doing some small-scale development work using the iNaturalist API.

I would like to take this opportunity to thank @Pisum, who helped me write a Python script to import observations into a PostGreSQL database (based on these scripts).

However, there is one aspect of my current workflow that I find less than optimal. It does not seem possible to retrieve only those observations that have been modified since a certain date.

As a result, if I want to keep a database up to date according to certain criteria (for example, to keep it simple, ‘my observations’), I am forced to re-download all the observations that meet the criteria, even though the vast majority have not changed since the last export.

This means that we are putting a lot more strain on the iNaturalist servers for no reason (in my case, I currently re-download 7,000 observations each time).

Can you confirm that this is not currently possible? And if so, wouldn’t this be an important API development to request?

1 Like

Interesting topic, I’ll follow it with particular interest. I’ve already discovered a new YouTube channel to follow and a GitHub page.
Grazie :)

i think you asked me about this before, and i said that there wasn’t a great way to get observations based on an updated date range, like you can with observation date range (d1 and d2) and submit date range (created_d1 and created_d2). i looked through the API documentation again, and while there isn’t updated_d1 and updated_d2, there is updated_since. updated_since effectively is just updated_d1 but named using a different convention (making it harder to find).

so getting records updated since a particular date is possible. suppose you want to find items updated since the beginning of the year. that would be GET /v1/observations?updated_since=2025-01-01. i believe this would filter based on UTC.

(Sorry, I wrote my message before reading Pisum’s reply.)

I think I said something wrong.
I just got some help from an AI, and it seems that it’s already possible, see this Python code (sorry, the comments are in French because it’s my native language).

import requests
import csv
from datetime import datetime

# Paramètres de la requête
params = {
    "user_id": "sylvainm_53",
    "updated_since": "2025-01-01T00:00:00+00:00",
    "verifiable": "any",
    "per_page": 200,  # Maximum autorisé par l'API
}

# URL de l'API iNaturalist v1 pour les observations
url = "https://api.inaturalist.org/v1/observations"
headers = {"Accept": "application/json"}

# Liste pour stocker les données des observations
observations_data = []

# Récupérer toutes les observations (avec pagination)
page = 1
while True:
    params["page"] = page
    response = requests.get(url, params=params, headers=headers)
    if response.status_code == 200:
        data = response.json()
        observations = data["results"]
        if not observations:
            break  # Plus de résultats
        for obs in observations:
            observations_data.append({
                "observation_id": obs["id"],
                "created_at": obs["created_at"],
                "observed_on": obs["observed_on"],
                "updated_at": obs["updated_at"]
            })
        print(f"Page {page} : {len(observations)} observations récupérées")
        page += 1
    else:
        print(f"Erreur lors de la requête (page {page}) : {response.status_code}")
        break

# Nom du fichier CSV
date_str = "20250101"  # Date de modification au format YYYYMMDD
csv_filename = f"inat_sylvainm_53_modified_since_{date_str}.csv"

# Export des données dans un fichier CSV
with open(csv_filename, mode="w", newline="", encoding="utf-8") as file:
    writer = csv.DictWriter(file, fieldnames=["observation_id", "created_at", "observed_on", "updated_at"])
    writer.writeheader()  # Écriture de l'en-tête
    writer.writerows(observations_data)

print(f"\nNombre total d'observations récupérées : {len(observations_data)}")
print(f"Fichier CSV généré : {csv_filename}")

humans forget things that they once knew and don’t always process large sets of information effectively. so here’s proof that AI is better than even knowledgeable humans sometimes, or at least that it never hurts to ask for a second opinion, whether from humans or AI.

although the parameter exists, i think there is still a possible change request here. in both the v1 and v2 API documentation, updated_since is listed later than most of the other date filter parameters, making it easy to miss. so putting it in closer proximity to the other date parameters in the documentation would make it harder to miss.

i think it would be relatively easy for someone to go through all the parameters and group them together a little more logically. this applies for the date parameters, but also for license parameters, project-related parameters, etc.

2 Likes

I’m taking advantage of your presence (human!) to update my Python script with this modification date setting.

In your opinion, what would be the syntax to add updated_since to this section of the query settings:

# define the parameters needed for your request
req_params_string = 'verifiable=true&spam=false'
req_params = params_to_dict(req_params_string)
req_headers_base = {'Content-Type': 'application/json', 'Accept': 'application/json'}

1 Like

although you can modify the req_params directly, i would change the req_params_string. (req_params_string is closer in format to the URL parameters you would pass to the API or use with the Explore / Identify pages in the website.) if filtering for observations updated since 2025-01-01, i would add &updated_since=2025-01-01 to the string value (req_params_string = 'verifiable=true&spam=false&updated_since=2025-01-01').

2 Likes

Just tested it: it works perfectly! :star_struck:

OK, one last question for today: do you think it’s possible to specify the time in this format?
Because I tried with &updated_since=2025-10-01T00:00:00+00:00 and it doesn’t work.

This would ensure that nothing is missed during update requests, based on the date and time of the last execution, which is recorded each time.

[edit]
Otherwise, the simplest solution will be to make the request by adding the data modified the day before execution (date minus 1 day), and to manage the rare duplicates in the following scripts.

if you use &updated_since=2025-10-01, that should be the same as &updated_since=2025-10-01T00:00:00.000, since the system should default to 12AM when no time is specified. it looks like the system won’t do time zone conversions for your inputs. so when you specify the date/time, you just need to specify them at UTC, and leave out the offset.

1 Like

If you’d rather use local time, I’ve always found the offset works in the format 2024-11-09T00:00-08:00 (-08:00 for Pacific Standard Time), so maybe you just need to drop the seconds from the time. (Sorry, I didn’t have time to test this right now – maybe something in the API has changed recently.)

actually, the problem here seems to be related to offsets using a plus sign. minus sign offsets seem to work regardless of whether seconds or milliseconds are included. so i guess there’s probably a bug here.

update: it looks like using %2B in place of + works. i guess that makes sense since plus signs are sometimes interpreted as spacers in URLs.

I should have thought of that as the more likely issue – sorry for my misleading response.

your response wasn’t misleading. if anything, it led to the actual answer by pointing out that some configurations of offsets do get handled properly by the system. that’s why it’s nice to discuss things like this in the forum. we all contribute a little to get to the final answer.

2 Likes

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.