I raised this issue a while back, and a github issue was opened, but it’s been untouched/unaddressed in the 4 months since sadly
Dianne prompted me to have a look at that big CSV so I will have some notes to post here shortly. Sorry it’s taken me so long! Other things have definitely gotten in the way, not least of which is that my old laptop would definitely not have been able to handle it
Okay so I have finally looked through the iNat-ALA transfer zip file, and I’ve looked at the ALA too to see what’s still going wrong. It seems like they have actually fixed some of our issues but there are still problems.
First off I’ve just got some comments on the CSV files in the zip file and some questions that hopefully @carrieseltzer can answer! (Sorry that it is a lot of questions, but I’m trying to get to the bottom of the problems - no rush in answering them)
So there are two CSV files inside, one titled ‘media’ for images and sounds, and one titled ‘observations’ for the observations themselves.
Firstly, are these CSV files generated new directly from the observations each time a transfer is made, or is it modified with new updates from a previous version?
Which sightings are included in them? The first sightings at least are all extremely old and don’t seem to have been modified recently, so are all observations sent every time or is it those that have been modified in some way?
Specifically within the ‘media’ CSV, there is a row for every media file sent to ALA including images and sounds.
What licences are required for media to be transferred, and which licences do not permit transfer?
Excel truncates the data because there are so many rows so I can’t see every record, but it indicates that both sounds and still images are transferred. Are gifs also tranferred or is this not supported? If a sighting contains a gif, are the other images also transferred?
Perhaps most odd is that in sightings that have more than one photograph associated with them, the number of photographs given in the CSV is one less than the true number, with the first photograph excluded. E.g. if a sighting has 5 photographs, the CSV only has entries for photographs 2, 3, 4, and 5. Is there a reason for this??
Within the ‘observations’ CSV there are many more columns and a great deal more data, which makes sense. It includes things like date, time, and identification, but also more detailed information including annotations, coordinate uncertainty, and observation and identification remarks. It does seem that ALA now includes all of this data but more on that later.
Similarly to the previous CSV, what licence is required for an observation to be tranferred and what licences will not permit transfer? If an observation has a licence that is less restrictive than the licences on the media it contains, will the observation be transferred without the media?
Both Research Grade and Needs ID observations are sent, including observations without a Community Taxon (i.e. with only one ID). Is the taxon name sent to ALA the Leading Taxon, the Display Taxon (i.e. the name at the top of the sighting), or the Community Taxon (if it has one)?
Okay, so looking at the ALA sightings now there seems to be some resolution of our problems.
Issue 1 seems to be partially resolved at least, so they have made some headway into mapping taxa correctly. Abantiades no longer maps to Diphyrrhynchus (Neoabantis) which is good, but Trichonephila edulis still maps to Nephila without comment and many other taxa seem to have similar undesirable problems. I think this is inevitable given how quickly iNat taxonomy can change to fit the latest research, but we really do need a way to log the problems. It’s been almost 3 years since Trichonephila was re-elevated so it seems unlikely that these problems are being dealt with in a timely manner by the ALA team (not that it’s their fault - it’s a big task).
Issue 2 does seem to have been resolved, which is great. Certainly all of my own sightings have been updated when I push them back to a higher taxon, but it would be great if other people could confirm this.
Issue 3 has not been resolved at all. This would seem like an easy enough fix from the iNat end by simply adding the extra places in to the transfer CSVs, but we may need to notify ALA as well or something. Notably there is currently no iNat place for the Australian Antarctic Territory or its surrounding waters. In my opinion this is a very beneficial place to have as it completes the total area owned by Australia, and the total area of data that could be utilised by ALA, but if people disagree then I’m more than happy to discuss. It is a huge area (almost 6 million square kilometres) and is well above the current limit for place creation, but I doubt that its addition would slow down iNat at all given that there are only around 200 observations in the whole area. Do other people agree that we should add this, and is there a way to add it? It would also be useful to include in the BowerBird project.
Issue 4 has also not been resolved at all. At least all of the images in a sighting seem to get transferred, but the order in which they show up in the record page seems totally random, and the one chosen for display in the Gallery tab seems random as well, just as before.
It does seem that ALA is getting a better hold on transferring all the relevant sightings at least, although I would appreciate some confirmation that other people are getting the same results. Within my own sightings, I had a look at animals observed in Australia (excluding coastal waters, so all sightings should have transferred). Excluding 7 very recently added observations, I have 4910 sightings that should have transferred. ALA shows that 4909 of them have transferred. That’s great! But which one didn’t transfer and why?? I’m kind of more confused than before now. All of my sightings have the same licence and none of them are Casual, so why did one not transfer??? Very confusing.
Within the ‘Occurence Records’ section, ALA now automatically excludes a number of records for various reasons, some confusing or just plain incorrect (note that these are all still present on the main ALA pages though). It has excluded some based on having user flags which is fair enough, and the automated system has detected that a couple are outliers based on precipitation and climate as well. It’s also excluding all of the obscured sightings, because the coordinate uncertainty is too large for it. Not something I would do but makes sense I guess.
What makes less sense is the other reasons though. It excludes some based on “scientific name quality” - looking at the sightings in question they seem to be an odd mix. They are all things where the ALA algorithm couldn’t find an exact match for the name given, and assigned it to a higher taxon. That’s a decent reason for exclusion, but it still tells us that ALA has trouble with iNat taxa. Worse is that some of the taxa that it excludes don’t actually differ at all from their ALA counterparts. ALA has Tetragnatha, but all of my Tetragnatha sightings have been matched instead to Tetragnathidae. It also seems to have trouble parsing some subgenera - e.g. it cannot handle Conocephalus (Anisoptera) or Leptotarsus (Macromastix) and yet it deals with all of my Xylocopa (Koptortosoma) sightings perfectly well. It also doesn’t exclude a bunch of mismatched taxa. Trichonephila edulis, the same example I went with originally, is a perfect example of this. So the algorithm finds some of the errors, but it also misses many and it finds some errors where there are none.
More odd is that it excludes more than a thousand observations because they are apparently duplicates. I have no idea what this means. I can find no real reason as to why some are apparent duplicates and some aren’t. I was thinking maybe if another iNat sighting had been made on the same day nearby it would be too similar for ALA and they assumed it was the same, but in some cases there are no other observations anywhere nearby. So I’m not sure on that one. It seems maybe something went wrong with the transfer from iNat. Yet another problem to look into.
So in summary, it does seem that a number of our problems have been fixed. There are still several issues though and I will actually try to contact ALA support this time to see if we can get them fixed. Keen to hear everyone’s comments on what I’ve written here, and especially keen to get some answers to the questions regarding the transfer CSVs.
"Observations will come across to the ALA if they are:
- Shareable under a Creative Commons license
- In Australia
- Verifiable observations - those which are marked Needs ID or Research Grade"
Exactly which of the licenses this translates to, I’m unsure
Another issue I cannot wrap my head around; this page states that:
Data is harvested from iNaturalist Australia to ALA daily
Yet this observation of mine from four days ago is still not in there:
(and it doesn’t seem to have mapped to the genus either)
Hmmmmmmm I have a couple of observations from a few days ago that also haven’t been transferred yet. It would be good to monitor for when your sighting does get transferred. I had assumed it wasn’t that frequent but if it says it is then clearly there is a problem.
Thanks. I have changes my license so will check to see if sightings transfer in a week or so.
as of today, this record is now in there. Unfortunately the photo selection process is still being applied in a stupid way, and my third photo of 3 (i.e. the worst one) has been chosen as the overview photo
I don’t know if they’re no longer transferring, but Diphyrrhynchus (Neoabantis) is still full of pictures of Abantiades moths. The source of the problem is that the beetle genus Abantiades Fairmaire, 1894 is a junior synonym of Diphyrrhynchus (Neoabantis), as well as a junior homonym of the moth Abantiades Herrich-Schäffer, 1855.
None of those records are from iNat any more though, they are mostly from QuestaGame
Why cant they just include all photos in the gallery?
Another Eg of missing observations this observation has CC-BY-NC licensing and a lot of photos but isn’t listed on the ALA at all! https://bie.ala.org.au/species/https://id.biodiversity.org.au/node/apni/2891873#overview
if you click on the individual record, all the photos get shown, it’s just the main gallery view that only picks one per observation. I think this approach does make sense, otherwise for some of the more common species you’d have tens of thousands of photos in the gallery. But the issue is that they’re not picking the first (‘best’) photo.
This one was only posted 3 days ago, so combined with my example from above, it seems to take 5-6 days for them to go across