Export counts do not match up with Stats count

deedesie · January 17, 2020, 9:03pm

The Bimby Project shows as of today 3077 observations, yet when I export all, it only has 3076 observations.
I can only think the header row is included in the overall count? OR… I have a bad filter that is filtering out a record? Thoughts?

bouteloua · January 17, 2020, 10:55pm

Could an observation have been added in the time since you initiated the export? The project currently reads 3078 observations.

pisum · January 18, 2020, 12:11am

this observation seems to be excluded from the export: https://www.inaturalist.org/observations/32061329. figuring out why it’s excluded will take further research.

with 3078 observations that should be exported, this one should have shown up as #2601, or line 2602, in the csv. interestingly, if you export a smaller subset of observations that includes the missing observation here, it will be exported in that smaller subset. so i think that means that it’s not the particular observation that has a problem.

deedesie · January 18, 2020, 12:31am

Thank you for your response, but no - every time a new observation is added - the number of exported rows is always out by one.I think @pisum found the answer…or part of it.

deedesie · January 18, 2020, 12:34am

Thank you. yes - this is most likely the answer. - as i was typing this I see you have provided more details. let me take a look.

deedesie · January 18, 2020, 12:37am

hmmm… I wonder what is causing it I will continue to investigate but appreciate your responses.

pisum · January 18, 2020, 12:42am

yeah, it’s strange. if i filter for just verifiable observations in the project, i should get 3021 back, but i’m getting 3019 in the download. that’s 2 missing. so there’s something deeper going on that i suspect may only be effectively understood by someone debugging the system code. that said, i won’t dig any deeper into this unless i think of some sort of magical answer in my head later.

kiwifergus · January 18, 2020, 12:54am

could it be an indexing thing? Try changing a small detail on the observation, like setting captive/cultivated and then changing it back. The “jostle” might be enough to cause a re-index that brings it back into the result set.

pisum · January 18, 2020, 12:58am

i don’t think so. if it was missing from the index, i think the particular observation wouldn’t show up in any results. i suspect the problem is not related to any specific observation.

thebeachcomber · July 17, 2023, 1:55am

I’ve just encountered the same problem. My project has 2836 observations. When I click download, the export preview shows all 2836 observations as expected. However, the actual downloaded file only contains 2821 observations; 15 are missing.

An example of a missing one is https://www.inaturalist.org/observations/58814504

thebeachcomber · July 17, 2023, 2:08am

I just tried downloading only a subset of that project data, the first 1294 observations, which contains that missing one. The resulting csv file only has 1288 observations, 6 missing, and one of those missing is once again that same linked observation

thebeachcomber · July 17, 2023, 2:10am

Third test; I tried downloading only observations from the single day which included that missing ob (n = 51). All observations downloaded perfectly.

pisum · July 17, 2023, 3:00am

below are all the missing observations and their positions within the 2836 records, if sorted ascending by observation id. it looks like the export processes records in chunks of 100, and for whatever reason, it occasionally skips over a record.

Skipped records

position	obs id	id of prev record
101	58814504	58814503
202	59428933	59428932
403	60506935	60506934
504	60690436	60690435
805	62015017	62015016
1206	66177965	66177964
1407	69444722	69444721
1608	74606911	74606910
1709	76498630	76498629
1810	81004436	81004435
1911	84278860	84278859
2212	98464656	98464655
2513	114121848	114121847
2614	138799859	138799858
2715	140488506	140488505

…

UPDATE:
looking at the records that weren’t skipped (assuming processing in chunks of 100) and comparing these to the records that were skipped, it looks like a record will get skipped if the first record in the new batch of 100 has an id that is exactly 1 greater than the last record in the previous batch.

Not Skipped records

position	obs id	id of prev record
303	60166910	60166908
605	60803656	60802851
705	61847301	61847299
906	63627198	63626633
1006	64585580	64585578
1106	65390511	65390509
1307	68313317	68313315
1508	71289749	71289226
2012	88908724	88908326
2112	95373498	95373496
2313	99069152	99069148
2413	102155625	102152633
2816	140490640	140490638

… so without looking at the code, i’m thinking the problem must arise like so:

the export job gets processed in a series of batches that get 100 observations at a time, in order of observation id (ascending).
each batch after the first will add a filter condition that should get observations with ids > [id of the last record in the previous batch] – except the filter is getting set incorrectly and is actually getting ids > [id of the last record in the previous batch]+1.

pisum · July 19, 2023, 1:15pm

looks like @jwidness looked at the code and described a solution here: https://github.com/inaturalist/inaturalist/issues/3815.

interestingly, assuming the analysis is correct, it seems like the skipping records issue would probably affect the GBIF export and possibly a few other processes, too. so that might raise the priority in the eyes of @tiwane and team.

marias4 · September 9, 2020, 7:55am

Please fill out the following sections to the best of your ability, it will help us investigate bugs if we have this information at the outset. Screenshots are especially helpful, so please provide those if you can.

Platform (Android, iOS, Website): Website

App version number, if a mobile app issue (shown under Settings or About):

Browser, if a website issue (Firefox, Chrome, etc) : Crome

URLs (aka web addresses) of any relevant observations or pages: https://www.inaturalist.org/projects/stadin-kimalaiskisa

Screenshots of what you are seeing (instructions for taking a screenshot on computers and mobile devices: https://www.take-a-screenshot.org/):

Description of problem (please provide a set of steps we can use to replicate the issue, and make as many as you need.):

Step 1: Download observations data

Step 2: Using the default columns to select

Step 3: Exporting the data to Excel as a csv file

Step 4: The data excludes some of the observations without any obvious reason. Handling the data cannot be done securely and easily. The project was a challenge where the winner was awarded. We had to do the calculation of observation manually, as the exported data couldn’t be trusted.

jwidness · September 9, 2020, 11:07am

Welcome to the forum!
Can you give an example of an observation that should have been in the export but wasn’t?

marias4 · September 14, 2020, 7:22am

Hi, thanks!
There were several missing, but for example this one wasn’t exported: https://inaturalist.laji.fi/observations/52000274

We were looking for four different bumblebees (Bombus hypnorum, Bombus pascuorum, Bombus pratorum and Bombus veteranus) and four of the participants who had most of those (research grade) were awarded. During the calculations, we noticed that the exported data didn’t include all the observations (we tried to export several times). Username lauralallallaa had 47 observations from which 44 was exported, kplamberg had 27 from which 25 was exported, piaruoho had 20 from which 17 was exported. We couldn’t identify the reason for this.

pisum · September 14, 2020, 11:43am

this problem was previously described but never resolved i think. see https://forum.inaturalist.org/t/export-counts-do-not-match-up-with-stats-count/9515.

marias4 · September 14, 2020, 12:08pm

Thank you Pisum for pointing out. Yes, this is most probably the same problem.

jwidness · September 14, 2020, 1:18pm

I just exported observations for Stadin Kimalaiskisa and got all 269 currently listed on the website, including https://inaturalist.laji.fi/observations/52000274.

I’m a little confused by your numbers – you say lauralallallaa had 47 observations, but iNat shows 75:

Presumably you mean 47 observations of the target species, but the project collects all Bombus, including ones you weren’t targeting. Were you filtering for these species before export or after?

Just out of curiosity, is there a reason the project collected all Bombus and not just the 4 targeted species?

Topic		Replies	Views
Export by observation ID number URL blank CSV Bug Reports question	7	351	December 6, 2023
Download is missing observations Bug Reports	3	312	April 14, 2021
Data users— what are your use cases and requests for exporting data? General	83	11428	December 14, 2022
Select observations to batch-download from list of observation IDs General question	19	233	April 21, 2025
Downloading a CSV of all observations of a species with python General programming	22	3152	May 28, 2022

Export counts do not match up with Stats count

Skipped records

Not Skipped records

Related topics