Export counts do not match up with Stats count

The Bimby Project shows as of today 3077 observations, yet when I export all, it only has 3076 observations.
I can only think the header row is included in the overall count? OR… I have a bad filter that is filtering out a record? Thoughts?

1 Like

Could an observation have been added in the time since you initiated the export? The project currently reads 3078 observations.

this observation seems to be excluded from the export: https://www.inaturalist.org/observations/32061329. figuring out why it’s excluded will take further research.

with 3078 observations that should be exported, this one should have shown up as #2601, or line 2602, in the csv. interestingly, if you export a smaller subset of observations that includes the missing observation here, it will be exported in that smaller subset. so i think that means that it’s not the particular observation that has a problem.

Thank you for your response, but no - every time a new observation is added - the number of exported rows is always out by one.I think @pisum found the answer…or part of it.

Thank you. yes - this is most likely the answer. - as i was typing this I see you have provided more details. let me take a look.

hmmm… I wonder what is causing it I will continue to investigate but appreciate your responses.

1 Like

yeah, it’s strange. if i filter for just verifiable observations in the project, i should get 3021 back, but i’m getting 3019 in the download. that’s 2 missing. so there’s something deeper going on that i suspect may only be effectively understood by someone debugging the system code. that said, i won’t dig any deeper into this unless i think of some sort of magical answer in my head later.

1 Like

could it be an indexing thing? Try changing a small detail on the observation, like setting captive/cultivated and then changing it back. The “jostle” might be enough to cause a re-index that brings it back into the result set.

i don’t think so. if it was missing from the index, i think the particular observation wouldn’t show up in any results. i suspect the problem is not related to any specific observation.

I’ve just encountered the same problem. My project has 2836 observations. When I click download, the export preview shows all 2836 observations as expected. However, the actual downloaded file only contains 2821 observations; 15 are missing.

An example of a missing one is https://www.inaturalist.org/observations/58814504

1 Like

I just tried downloading only a subset of that project data, the first 1294 observations, which contains that missing one. The resulting csv file only has 1288 observations, 6 missing, and one of those missing is once again that same linked observation

Third test; I tried downloading only observations from the single day which included that missing ob (n = 51). All observations downloaded perfectly.

below are all the missing observations and their positions within the 2836 records, if sorted ascending by observation id. it looks like the export processes records in chunks of 100, and for whatever reason, it occasionally skips over a record.

Skipped records

position obs id id of prev record
101 58814504 58814503
202 59428933 59428932
403 60506935 60506934
504 60690436 60690435
805 62015017 62015016
1206 66177965 66177964
1407 69444722 69444721
1608 74606911 74606910
1709 76498630 76498629
1810 81004436 81004435
1911 84278860 84278859
2212 98464656 98464655
2513 114121848 114121847
2614 138799859 138799858
2715 140488506 140488505

UPDATE:
looking at the records that weren’t skipped (assuming processing in chunks of 100) and comparing these to the records that were skipped, it looks like a record will get skipped if the first record in the new batch of 100 has an id that is exactly 1 greater than the last record in the previous batch.

Not Skipped records

position obs id id of prev record
303 60166910 60166908
605 60803656 60802851
705 61847301 61847299
906 63627198 63626633
1006 64585580 64585578
1106 65390511 65390509
1307 68313317 68313315
1508 71289749 71289226
2012 88908724 88908326
2112 95373498 95373496
2313 99069152 99069148
2413 102155625 102152633
2816 140490640 140490638

… so without looking at the code, i’m thinking the problem must arise like so:

  • the export job gets processed in a series of batches that get 100 observations at a time, in order of observation id (ascending).
  • each batch after the first will add a filter condition that should get observations with ids > [id of the last record in the previous batch] – except the filter is getting set incorrectly and is actually getting ids > [id of the last record in the previous batch]+1.
4 Likes

looks like @jwidness looked at the code and described a solution here: https://github.com/inaturalist/inaturalist/issues/3815.

interestingly, assuming the analysis is correct, it seems like the skipping records issue would probably affect the GBIF export and possibly a few other processes, too. so that might raise the priority in the eyes of @tiwane and team.

1 Like

Please fill out the following sections to the best of your ability, it will help us investigate bugs if we have this information at the outset. Screenshots are especially helpful, so please provide those if you can.

Platform (Android, iOS, Website): Website

App version number, if a mobile app issue (shown under Settings or About):

Browser, if a website issue (Firefox, Chrome, etc) : Crome

URLs (aka web addresses) of any relevant observations or pages: https://www.inaturalist.org/projects/stadin-kimalaiskisa

Screenshots of what you are seeing (instructions for taking a screenshot on computers and mobile devices: https://www.take-a-screenshot.org/):

Description of problem (please provide a set of steps we can use to replicate the issue, and make as many as you need.):

Step 1: Download observations data

Step 2: Using the default columns to select

Step 3: Exporting the data to Excel as a csv file

Step 4: The data excludes some of the observations without any obvious reason. Handling the data cannot be done securely and easily. The project was a challenge where the winner was awarded. We had to do the calculation of observation manually, as the exported data couldn’t be trusted.

2 Likes

Welcome to the forum!
Can you give an example of an observation that should have been in the export but wasn’t?

1 Like

Hi, thanks!
There were several missing, but for example this one wasn’t exported: https://inaturalist.laji.fi/observations/52000274

We were looking for four different bumblebees (Bombus hypnorum, Bombus pascuorum, Bombus pratorum and Bombus veteranus) and four of the participants who had most of those (research grade) were awarded. During the calculations, we noticed that the exported data didn’t include all the observations (we tried to export several times). Username lauralallallaa had 47 observations from which 44 was exported, kplamberg had 27 from which 25 was exported, piaruoho had 20 from which 17 was exported. We couldn’t identify the reason for this.

this problem was previously described but never resolved i think. see https://forum.inaturalist.org/t/export-counts-do-not-match-up-with-stats-count/9515.

1 Like

Thank you Pisum for pointing out. Yes, this is most probably the same problem.

I just exported observations for Stadin Kimalaiskisa and got all 269 currently listed on the website, including https://inaturalist.laji.fi/observations/52000274.

I’m a little confused by your numbers – you say lauralallallaa had 47 observations, but iNat shows 75:

image

Presumably you mean 47 observations of the target species, but the project collects all Bombus, including ones you weren’t targeting. Were you filtering for these species before export or after?

Just out of curiosity, is there a reason the project collected all Bombus and not just the 4 targeted species?

1 Like