Many IDs have ID category: null

Found this when trying to square my various ID stats not matching when looking at my Year in Review.

ID Category My 2020 IDs via inaturalist.org/identifications My 2020 IDs via API GET /identifications
Improving 2,505 2,505
Leading 2,650 2,650
Supporting 4,965 4,965
Maverick 2 2
Total 10,122 10,489

Totalling up each of the four ID categories, you get 10,122 IDs - i.e. the sum of the obs in the pie charts.

But if you search IDs via the API without a category specified, there are 10,489 IDs, meaning there are 367 obs in that time period that do not have a category assigned.

https://api.inaturalist.org/v1/identifications?id=84340111
image

2020-01-17 to 2020-02-05:
https://api.inaturalist.org/v1/identifications?own_observation=false&user_id=bouteloua&current=true&d1=2020-01-17&d2=2020-02-05

Maybe withdrawn IDs have something to do with it.

no, in the examples / analysis referenced above, we’re dealing with current IDs. so whatever happened is unlikely to be related to withdrawing IDs.

1 Like

not sure how useful this might be, but i went ahead and made a page to summarize identifications by category (including an “Other”, or uncategorized, category). it could provide a way for folks without direct access to the database to quickly get these stats for different sets of parameters.

page: https://jumear.github.io/stirfry/iNat_id_summary_by_category.html
code: https://github.com/jumear/stirfry/blob/gh-pages/iNat_id_summary_by_category.html

example usage: https://jumear.github.io/stirfry/iNat_id_summary_by_category.html?own_observation=false&user_id=bouteloua&d1=2020-01-01&d2=2020-12-31

i might add a pie chart or something like that later…

2 Likes

So I tried that URL and got 77 category Other identifications (0.4%). I tried it again with a much earlier start date and got the same number, suggesting that they were all in 2020. Is anyone seeing any in this category from previous years?

yes, there definitely are a few cases from previous years (https://jumear.github.io/stirfry/iNat_id_summary_by_category.html?own_observation=false&d2=2019-12-31), but the bulk of the cases are definitely from 2020 (https://jumear.github.io/stirfry/iNat_id_summary_by_category.html?own_observation=false&d1=2020-01-01), and Q1 2020 in particular (https://jumear.github.io/stirfry/iNat_id_summary_by_category.html?own_observation=false&d1=2020-01-01&d2=2020-03-31).

are you doing some additional investigation, or are you just trying to understand the issue better?

I’ve got a full 2.2% Other. I wonder what the difference is between us that I’ve got proportionately so many more…
and 65 maverick! time to go fix those…

1 Like

overall, the uncategorized identifications for others make up 6.4% of identifications for others in 2020. so i don’t think your 2.2% is above average in that respect, however, during Q1 2020, 54.5% of your identifications for others were uncategorized (https://jumear.github.io/stirfry/iNat_id_summary_by_category.html?own_observation=false&d1=2020-01-01&d2=2020-03-31&user_id=trh_blue), which is quite bit more than average.


for what it’s worth, it looks like there was some experimentation with how identifications trigger observation reindexing early in 2020 (see https://github.com/inaturalist/inaturalist/commits/main/app/models/identification.rb), which may coincide with the appearance of most of these uncategorized identifications. probably what needed to be fixed got fixed, but maybe doing a lookback to see why these changes may have triggered the problem might be warranted (just to understand what not to do in the future). i suppose the data should be cleaned up at some point, too (which i guess probably is the ultimate fix for this bug report at this point).

2 Likes

Is cleaning up that data something we can do on our own? And if so, how?

There are around 700 from just this past week, so it would probably be something staff should look into rather than an ongoing user clean-up effort.

1 Like

just looking at the code history, it looks like changing anything on the observation – annotations, identifications, quality flags, etc. should trigger recategorization of all the identifications. i tested this by adding a flowering annotation on an observation where my id was previously uncategorized, and the identification was categorized properly after the fact.

observation before and after:

api response before and after:

that said, doing this for the millions of affected observations seems inefficient at best. so i agree that: