Filtering out (multiple) annotations when downloading INaturalist datasets

Dear iNaturalist team and community,

I (together with my a fellow student) am currently conducting my Biology Bachelor’s thesis at the Vrije Universiteit Amsterdam. My research focuses on potential shifts in species nocturnality within urban environments compared to non-urban areas across various cities in the United States.

Since my analysis relies on the timestamp data of observations to determine activity patterns, only photos of living animals represent usable data points for my study. I am currently trying to filter my dataset via the Identify tool, but I have encountered an issue with the annotation filters that appears to be a bug in the filtering logic.

It appears that the filtering logic for annotations does not allow for precise exclusions. Specifically, when I apply a filter such as “Without annotation: Dead or Alive – Dead,” the results also exclude observations annotated as “Alive” as well, rather than specifically targeting and removing only the “Dead” records. The same occurs within the “Evidence of Presence” category; applying a “without” filter for one attribute, such as tracks, seems to unintentionally remove other relevant records that I need to keep.

Furthermore, it seems impossible to apply multiple “without” filters simultaneously, such as excluding “Dead,” “Scat,” and “Tracks” all in one go.

This makes it very difficult to clean a large dataset efficiently. While the alternative is downloading separate datasets and merging them manually, this is highly impractical given the scale of the data and the multi-city scope of my research.

I would appreciate any suggestions on how to bypass these filtering limitations. Specifically, I am looking for a way to exclude multiple annotations simultaneously. I work with R-studio for the analysis, so maybe someone knows if the “rinat” package can be of good use?

Thank you in advance!

Kind regards, Jaime Meijer - Bachelor Student, Vrije Universiteit Amsterdam

could you provide some examples of urls you are trying to use, as well as specific examples of observations you say are returned in error?

i don’t have time to look into whether the filter parameters are working as expected, but this may be useful in the meantime: https://forum.inaturalist.org/t/searching-for-annotations-basic-to-advanced/65375.

3 Likes

Only a small subset of observations are annotated.

Using without dead will exclude alive observations that are alive but have not been annotated alive. If want alive observations, it is better to use with annotation ‘Alive’ because someone marked those as alive.

If you are downloading animal observations from various cities across the United States, you will probably exceed the maximum 200,000 observations limit for iNaturalist downloads. You will need to do multiple downloads. iNaturalist recommends using GBIF for large downloads. However, I don’t think GBIF has iNaturalist annotations .

2 Likes

So this search excludes only those marked Dead: https://www.inaturalist.org/observations?term_id_or_unknown=17&without_term_value_id=19

3 Likes

The problem was indeed in the search url, thanks for the reply!!

That query will return unknown. Unknown will include observations that are dead but not annotated as dead. The OP needs to decide if having unknown observations that could include dead but are not annotated dead is ok.

1 Like

Yes that’s right, but the alive tag drops around 80% of the observations, so we need to use all datapoints. GBIF drops some observations dus to the license. so in terms of species specific data INAT looks more promising (only looking at 8 cities, so we need all the observations that are available in terms of the numbers). Thanks for the reply!

GBIF does get annotations, e.g.

https://www.gbif.org/occurrence/891058469
https://www.gbif.org/occurrence/1291166168

2 Likes

Did you consult with professors in your department to see if using observations that have do not have free to reuse license is ok for your thesis project?

I misspoke. I meant the default GBIF website simple download page does not allow filter by annotations, and default download csv does not include annotations. The GBIF website “all filters” allow filtering for some annotations such as sex, but it does not include all available iNaturalist annotations. If you filter by one of the available annotations, the annotations are not included in the downloaded csv.

Is the OP aware that iNaturalist downloads do not include annotations? They will need to use custom methods if they want to include the annotation values in the iNaturalist downloads.

I would also note that it is not uncommon that timestamps on iNat observations are incorrect. This can happen for a variety of reasons (device clock set incorrectly, timezone issue), but it is one of the data quality issues I see more frequently. I did some informal filtering of observations looking for those taken at nighttime once and was surprised by how many observations with a night timestamp had photos clearly taken during the day.

4 Likes

The without annotation form on the identify page is a little misleading. It looks like people can search for without ‘Alive or Dead’, and then pick ‘Alive’, ‘Dead’, or ‘Cannot be determined’. In reality, when you use without_term_id, the without_term_value_id value does not affect the result.

Gilroy, CA has
2,708 observations
180 alive term_id=17&term_value_id=18
21 dead term_id=17&term_value_id=19
2 cannot be determined term_id=17&term_value_id=20

2505 without Alive or Dead without_term_id=17
2505 without Alive or Dead, Alive without_term_id=17&without_term_value_id=18
2505 without Alive or Dead, Dead without_term_id=17&without_term_value_id=19
2505 without Alive or Dead, Cannot be determined without_term_id=17&without_term_value_id=20

If you look at the api, same results.
2505 without Alive or Dead
2505 without Alive or Dead, Alive
2505 without Alive or Dead, Dead
2505 without Alive or Dead, Cannot be determined

without_term_id excludes observations that have annotations with selected type. When the OP used without ‘Alive or Dead’, that ignored all observations that have ‘Alive or Dead’ annotations.

the value of term_value_id changes the number of observations for term_id. However, the value of without_term_value_id does not changes the number of observations for without_term_id .

If with ‘Alive or Dead’ + ‘Alive’ does not return enough observations for given city, the OP could add annotations to the observations that are missing annotations.