i’m not a SQL-explaining mood today. so maybe some other day i’ll explain the SQL process for this specific case a little more. i wrote something previously (https://forum.inaturalist.org/t/getting-the-inaturalist-aws-open-data-metadata-files-and-working-with-them-in-a-database/22135#work-with-the-data-in-a-sqlite-database-6) that explains how to load data to a SQLite database and run some queries. so for those who are interested in databases, you could look at that for now.
(tip: if you don’t actually need to save the database, you can create a new in-memory database rather than creating and saving a new database.)
you can’t do a distinct count in SQL queries in Excel’s version of SQL (or at least not without some convolutions, i think). so i’m not going to explain how to do that.
it’s possible to do something similar in basic Excel + Power Pivot, but it’s more manual work than just importing data and running a query. i’ll explain the key steps below:
- export data from iNaturalist. here are the settings i’m using in my example:
- Query: quality_grade=any&identifications=any&place_id=33&taxon_id=47157&hrank=species&verifiable=true
- Columns: id, user_id, user_login, created_at, scientific_name, common_name, taxon_id, taxon_species_name
- open up the data in Excel:
- add a column to identify when a row has user_login=becksnyc. (below, I added a “becksnyc” header in cell I1, added a formula
=IF(C2=I$1,1,0)
in cell I2, and then copied the formula down the column for all rows in the dataset.)
- save your data as an .xlsx file at this point. (you need it saved to properly initiate a Power Pivot.)
- now select your data, and go to Insert > Pivot Table
- when prompted, make sure you check the “Add this data to the Data Model” box at the bottom, and click OK. (checking the box creates adds the data as a table in Power Query and creates a Power Pivot against that data. otherwise, you would get just a plain Pivot Table, which doesn’t handle Distinct Count aggregations.)
- in your Pivot Table field thing, drop taxon_species_name into Rows, and drop becksnyc, id, and user_id into Values. then change the Value Field settings for id to summarize by Count instead of Sum. for user_id, summarize by Distinct Count instead of Sum
- at this point, the data that shows up in the Pivot Table is basically what you’ll need to get the 2 things you were looking for in your original post. if sum(becksnyc)=count(id), then that means the species was observed only by you. distinct count(user_id) tells you how many users (including you) observed the species. so just perform additional manipulation of the data to get your desired result.
EDIT: actually, now that i think about it again, you don’t need count(id) in the Pivot Table. if sum(becksnyc)>0 and distinct count(user_id)=1, then that means you’re the only one to have observed the species.