Search help: species found in project X but not in project Y

For observations in Rheingau county, there is project
and for observations in Mainz county (which is on the opposite side of Rhine river), it is

Now, I want to find out which species where found in one of the projects, but not in the other project. For species found in Rheingau but not in Mainz, I tried:
but the result is effectively the list of species found in Rheingau county - an observation is either in Rheingau county or Mainz county, but not both, and it looks like the filtering is done at the level of the individual observation, not at the level of the species in the observation.

What is the correct incantation?

There is a collection project
which contains both the projects mentioned above plus a few more.
Now I tried to get all species which were found in the project but noot in Rheingau county with
and … well, it lists the observations per species for the whole project minus the Rheingau project (starting with 1,485 honey bee observations, in the complete project it’s 1,535 obs). I.e. the collection project trick does not help here.

The closest think I could find is . You can enter the groups’ IDs and compare the species found in both. Unfortunately, the “not in common” tab isn’t working for me, but at least it tells you which species are present in both projects (and I think that in the first tab, you can see which species don’t have observations in one of them).

I also tried Stirfry, but I don’t have much experience with it so I don’t know if it worked:

I think it is possible to add this feature to, but afraid not in Jule, because I’m on vacation.


The compare tool does not work correctly. It lists a several species with an observation count in one project and a question mark in the other project. Took a look at 3 of those species, and opened the map: just wrong.

The Stirfry site does not run in my browser.

You are aware the species count in the compare tool is limited to 500 max? Beyond that it becomes unreliable as it omits taxa, so you would need to make several requests for different taxa below that threshold

If no solutions is found you could export the project’s observations and search for the missing species by yourself.

For example (I just use this month observations to do it quicker) in the export tool:

  • 1st project: quality_grade=research&identifications=any&projects%5B%5D=biodiversitaet-im-rheingau-taunus-kreis&d1=2023-07-01
  • 2nd project: quality_grade=research&identifications=any&projects%5B%5D=biodiversitaet-in-mainz-lk-mz-bingen&d1=2023-07-01

This will give you CSV that you can handle as you want. Example in Python.

import pandas as pd
import numpy as np

project1 = pd.read_csv('observations-proj1.csv', usecols=['id', 'scientific_name',
project2 = pd.read_csv('observations-proj2.csv', usecols=['id', 'scientific_name',

            id         scientific_name  taxon_id
0    170426124        Acer platanoides     54763
1    170586330       Oxythyrea funesta     68328
2    170586938     Melanargia galathea    130398
3    170587073      Clytra laeviuscula    123661
4    170643456         Nezara viridula    141725
..         ...                     ...       ...
103  171243666     Calvia decemguttata    326207
104  171250709  Oncotylus viridiflavus    555345
105  171281327      Lotus corniculatus     47435
106  171405518           Rubus caesius     64546
107  171424959       Evernia prunastri    123175

[108 rows x 3 columns]
            id      scientific_name  taxon_id
0    170457678      Serinus serinus      9236
1    170457680     Curruca communis   1289470
2    170457681  Carduelis carduelis      9398
3    170457685    Falco tinnunculus    472766
4    170458510     Sciurus vulgaris     46001
..         ...                  ...       ...
184  171449703       Rumex scutatus    333937
185  171449706       Cota tinctoria     76456
186  171449707   Melampyrum arvense    245680
187  171524854     Polyphylla fullo    360353
188  171525011     Polyphylla fullo    360353

[189 rows x 3 columns]

Getting taxons not in projects:

project1['taxon_id'] = project1['taxon_id'].astype(str)
project2['taxon_id'] = project2['taxon_id'].astype(str)

found_in_project1 = project1['taxon_id'].unique()
found_in_project2 = project2['taxon_id'].unique()

# in project 1 but not in 2
not_in_project1 = np.setdiff1d(found_in_project1, found_in_project2)

# in project 2 but not in 1
not_in_project2 = np.setdiff1d(found_in_project2, found_in_project1)

Checking it on iNat:

print('' + 
      ','.join(not_in_project1) + '&view=species')
print('' + 
      ','.join(not_in_project2) + '&view=species')

Will return you,109650,...,82685,85492&view=species and,124403,...,9398,94043&view=species, so you can use these links.

So, in this example: in project 1 and not in 2 and in project 2 and not in 1.

Thanks, @balln , for this idea. Downloading the data, or scraping the web page, and then doing further processing by custom scripts is of course a possibility. But it won’t allow me to provide an URL to other people who could then take a look at the data.

Let me add a few examples of species found in X but not Y:

Do these species really not exist there, or have they not yet been observed, or have they been observed but not correctly identified? We might want to take a look at observations of similar species. Or we might take a closer look next time we are out in Nature and looking for interesting critters.

A different situation can be seen with Corizus hyoscyami:
7 observations in Rheingau: 2 in 2022, and 5 in 2023
25 observations in Mainz-Bingen: 1 in 2020, 2 in 2021, 16 in 2022, 6 in 2023
Did the bug make it over the river in 2022 and establish a population here? Or just some common fluctuation?
This kind of query would additionally require a parameter for the year.

And of course, we must be aware of observer and their changing interests, which may cause a severe bias in the data.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.