The Bimby Project shows as of today 3077 observations, yet when I export all, it only has 3076 observations.
I can only think the header row is included in the overall count? OR… I have a bad filter that is filtering out a record? Thoughts?
Could an observation have been added in the time since you initiated the export? The project currently reads 3078 observations.
this observation seems to be excluded from the export: https://www.inaturalist.org/observations/32061329. figuring out why it’s excluded will take further research.
with 3078 observations that should be exported, this one should have shown up as #2601, or line 2602, in the csv. interestingly, if you export a smaller subset of observations that includes the missing observation here, it will be exported in that smaller subset. so i think that means that it’s not the particular observation that has a problem.
Thank you for your response, but no - every time a new observation is added - the number of exported rows is always out by one.I think @pisum found the answer…or part of it.
Thank you. yes - this is most likely the answer. - as i was typing this I see you have provided more details. let me take a look.
hmmm… I wonder what is causing it I will continue to investigate but appreciate your responses.
yeah, it’s strange. if i filter for just verifiable observations in the project, i should get 3021 back, but i’m getting 3019 in the download. that’s 2 missing. so there’s something deeper going on that i suspect may only be effectively understood by someone debugging the system code. that said, i won’t dig any deeper into this unless i think of some sort of magical answer in my head later.
could it be an indexing thing? Try changing a small detail on the observation, like setting captive/cultivated and then changing it back. The “jostle” might be enough to cause a re-index that brings it back into the result set.
i don’t think so. if it was missing from the index, i think the particular observation wouldn’t show up in any results. i suspect the problem is not related to any specific observation.
I’ve just encountered the same problem. My project has 2836 observations. When I click download, the export preview shows all 2836 observations as expected. However, the actual downloaded file only contains 2821 observations; 15 are missing.
An example of a missing one is https://www.inaturalist.org/observations/58814504
I just tried downloading only a subset of that project data, the first 1294 observations, which contains that missing one. The resulting csv file only has 1288 observations, 6 missing, and one of those missing is once again that same linked observation
Third test; I tried downloading only observations from the single day which included that missing ob (n = 51). All observations downloaded perfectly.
below are all the missing observations and their positions within the 2836 records, if sorted ascending by observation id. it looks like the export processes records in chunks of 100, and for whatever reason, it occasionally skips over a record.
Skipped records
position | obs id | id of prev record |
---|---|---|
101 | 58814504 | 58814503 |
202 | 59428933 | 59428932 |
403 | 60506935 | 60506934 |
504 | 60690436 | 60690435 |
805 | 62015017 | 62015016 |
1206 | 66177965 | 66177964 |
1407 | 69444722 | 69444721 |
1608 | 74606911 | 74606910 |
1709 | 76498630 | 76498629 |
1810 | 81004436 | 81004435 |
1911 | 84278860 | 84278859 |
2212 | 98464656 | 98464655 |
2513 | 114121848 | 114121847 |
2614 | 138799859 | 138799858 |
2715 | 140488506 | 140488505 |
…
UPDATE:
looking at the records that weren’t skipped (assuming processing in chunks of 100) and comparing these to the records that were skipped, it looks like a record will get skipped if the first record in the new batch of 100 has an id that is exactly 1 greater than the last record in the previous batch.
Not Skipped records
position | obs id | id of prev record |
---|---|---|
303 | 60166910 | 60166908 |
605 | 60803656 | 60802851 |
705 | 61847301 | 61847299 |
906 | 63627198 | 63626633 |
1006 | 64585580 | 64585578 |
1106 | 65390511 | 65390509 |
1307 | 68313317 | 68313315 |
1508 | 71289749 | 71289226 |
2012 | 88908724 | 88908326 |
2112 | 95373498 | 95373496 |
2313 | 99069152 | 99069148 |
2413 | 102155625 | 102152633 |
2816 | 140490640 | 140490638 |
… so without looking at the code, i’m thinking the problem must arise like so:
- the export job gets processed in a series of batches that get 100 observations at a time, in order of observation id (ascending).
- each batch after the first will add a filter condition that should get observations with ids > [id of the last record in the previous batch] – except the filter is getting set incorrectly and is actually getting ids > [id of the last record in the previous batch]+1.
looks like @jwidness looked at the code and described a solution here: https://github.com/inaturalist/inaturalist/issues/3815.
interestingly, assuming the analysis is correct, it seems like the skipping records issue would probably affect the GBIF export and possibly a few other processes, too. so that might raise the priority in the eyes of @tiwane and team.
Please fill out the following sections to the best of your ability, it will help us investigate bugs if we have this information at the outset. Screenshots are especially helpful, so please provide those if you can.
Platform (Android, iOS, Website): Website
App version number, if a mobile app issue (shown under Settings or About):
Browser, if a website issue (Firefox, Chrome, etc) : Crome
URLs (aka web addresses) of any relevant observations or pages: https://www.inaturalist.org/projects/stadin-kimalaiskisa
Screenshots of what you are seeing (instructions for taking a screenshot on computers and mobile devices: https://www.take-a-screenshot.org/):
Description of problem (please provide a set of steps we can use to replicate the issue, and make as many as you need.):
Step 1: Download observations data
Step 2: Using the default columns to select
Step 3: Exporting the data to Excel as a csv file
Step 4: The data excludes some of the observations without any obvious reason. Handling the data cannot be done securely and easily. The project was a challenge where the winner was awarded. We had to do the calculation of observation manually, as the exported data couldn’t be trusted.
Welcome to the forum!
Can you give an example of an observation that should have been in the export but wasn’t?
Hi, thanks!
There were several missing, but for example this one wasn’t exported: https://inaturalist.laji.fi/observations/52000274
We were looking for four different bumblebees (Bombus hypnorum, Bombus pascuorum, Bombus pratorum and Bombus veteranus) and four of the participants who had most of those (research grade) were awarded. During the calculations, we noticed that the exported data didn’t include all the observations (we tried to export several times). Username lauralallallaa had 47 observations from which 44 was exported, kplamberg had 27 from which 25 was exported, piaruoho had 20 from which 17 was exported. We couldn’t identify the reason for this.
this problem was previously described but never resolved i think. see https://forum.inaturalist.org/t/export-counts-do-not-match-up-with-stats-count/9515.
Thank you Pisum for pointing out. Yes, this is most probably the same problem.
I just exported observations for Stadin Kimalaiskisa and got all 269 currently listed on the website, including https://inaturalist.laji.fi/observations/52000274.
I’m a little confused by your numbers – you say lauralallallaa had 47 observations, but iNat shows 75:
Presumably you mean 47 observations of the target species, but the project collects all Bombus, including ones you weren’t targeting. Were you filtering for these species before export or after?
Just out of curiosity, is there a reason the project collected all Bombus and not just the 4 targeted species?