iNaturalist Licensed Observation Images in the Amazon Open Data Sponsorship Program

Please add comments or discussions related to the blog post “iNaturalist Licensed Observation Images in the Amazon Open Data Sponsorship Program” in this forum thread. Please read the blog post first before commenting here to reduce confusion.

6 Likes

@loarie The link text above has “/edit” appended, so it just goes to iNat homepage (since we can’t edit) (https://www.inaturalist.org/blog/49564-inaturalist-licensed-observation-images-in-the-amazon-open-data-sponsorship-program/edit) Can you edit to fix/remove? Feel free to delete this post…thanks!

3 Likes

I edited it, sorry about that!

4 Likes

good news.

a few questions:

  1. it looks like 3 of 4 of the metadata files in this first set are a week or so old. was that intentional?

  2. since metadata files will be generated on a monthly schedule, is the 15th going to be the date each month when they will be generated, or will it be some other date?

  3. have you all given any thought to either adding observation id to the observation metadata file or adding observation UUID to the iNaturalist CSV export?

4 Likes

Agree with point 3. Having the observation ID would be useful.

2 Likes

Yes this was intentional. We last generated the export on the 6th, but when doing a final review I noticed that when we added user names, I failed to trim whitespace from the usernames which caused some newlines to make it into the file, which caused import commands to fail. So I just removed the newlines from the observers file, but otherwise the data from all files was generated on the 6th.

since metadata files will be generated on a monthly schedule, is the 15th going to be the date each month when they will be generated, or will it be some other date?

We haven’t set an exact date yet. I’m still debating wether it’s better to generate the files on a fixed date (say the 1st or 15th of each month) or on a fixed day (say the first Saturday of each month). The former is easier to explain and maybe better for user expectations, the latter is easier for us to plan other background job around. Does anyone have a preference?

have you all given any thought to either adding observation id to the observation metadata file or adding observation UUID to the iNaturalist CSV export?

Ideally for us, we’d start using observation UUIDs everywhere right now and avoid auto-incrementing integer IDs entirely. Since this isn’t practical or possible to do all at once (we will be making this switch over time), of the two options I’d prefer to include observation UUIDs in the CSV exports. UUIDs are currently available via the API. Auto-increment integer IDs are a problem in that they have an implied meaning (sequence) that UUIDs do not. For various reasons we only want identifiers to identify resources and not have any additional implied meaning, which is why we have chosen to leave them out of this export and do not plan on adding them.

As an aside - we did leave photo IDs in the export since we currently store photos by ID, therefore the ID is needed to construct their URLs. Again, in an ideal world we’d automatically switch to UUIDs for photos as well, but there is a lot more work needed to be done to do the same with photos. This is something we may do in the future

2 Likes

ok. thanks. i figured it was something like this. i was originally thinking that folks who licensed their photos by the 15th would have their stuff included in the metadata files, too, but i was probably reading things with the wrong expectations. no big deal either way.

i think whatever works best from a technical perspective is the best path, but maybe the the second Saturday of the month might be be better than the first? (the first Saturday seems like it is more likely to run into things like New Year, CNC, and Independence Day in the USA.)

that would be my preference, too. do i need to make a feature request to discuss with the community, or is this something that can be implemented without need for much additional discussion?

I would find this convenient as well.

In the photos.csv export, is photo_uuid expected to be non-unique? I’m seeing some duplicates in there, for example, 92c3057e-9791-4385-b187-90bd0e120cd7 is associated with two different observations. I didn’t realize that was possible! So is photos <---> observations intended to be a many-to-many relationship?

the unique key would be photo_uuid + observation_uuid. photos:observations is a M:M relationship. a single photo asset can be assigned to multiple observations when you duplicate an observation.

1 Like

That’s what I was missing. Thanks.