I have been wanting to document my adventures in nature more effectively, so I am writing software to do that.
The prime feature is to organize photos by lifers, taxa and locations using personally selected photos. The photos would be copied to new folders, renamed (with taxa names, date etc in filenames). Personal observation data (from iNaturalist and other sources) would be used to help find the photos to select. It would also allow comments on individual photos, taxa, lifers, locations etc to allow the creation of something like a personal blog (covering topics like interesting stories, planning on how to get specific species, how to get the right pictures for species level identification etc). For me, I find that iNaturalist only tells part of the story of my adventures and I am creating something that can tell more of the story.
The current design includes the download of all personal observations in CSV format and loading them into the software. Additional CSVs from iNaturalist or iNaturalist utilities are required for information not included in the standard download. For example, if I wish to have a folder for my lichens I would need to create a bare bones CSV using the Lichens of Ontario project that to get the list of relevant lichen species.
If I were to use the API I would have the potential to get additional information on observations like the projects, annotations and fields assigned to an observation. Also, it might be possible (not sure) to get the original filename of each picture associated with an observation. With the original filename, the user can be shown directly which of their original are associated with an observation.
Go with the API makes things more convenient for the end user in many ways. But aside from having more to figure out on my end, I am concerned about several things. Is this use case of the iNaturalist API considered data scraping? If not, how much more of a load on the iNaturalist servers does using the API have versus the download CSV?
Very interesting use case. I have been trying to do something similar although mostly focusing on how to organise my collection of old and new observations and field notes in order to upload them to iNaturalist more efficiently.
I have shared some examples of using Python and R code for querying, downloading, and organising my iNat records in this blog (still in development and with content in English and Spanish): https://jrfep.quarto.pub/natural-code/
Regarding your questions, you can check the API recommended practices. In my experience, the load for a personal use is really low and you are unlikely to hit the query rates, specially if you are using filters in your requests.
although being able to get this information would make it relatively easy – or at least easier – to sync up your personal photo collection with your iNat observations (there are third-party tools like naturtag that try to do stuff stuff like this), the only the place the original filenames are available to regular users is on the photo pages (which you would have to then scrape to get that information).
there was a request to make the information available in the in API via a new photos detail endpoint, but that hasn’t gone anywhere.
however, it looks like someone has actually done some coding to make original filename available in the API via observation details, but i don’t think it’s been completed yet (although it might be done sooner than later).
I’ve been working on a somewhat similar project myself. I store my photos in DigiKam and wanted to import information from iNat into the metadata of the pictures I uploaded as well as their full quality originals, including the RAWs. DigiKam supports hierarchical tagging which is perfect for iNat observations, so I get a lot of browsing and organization power just in my local photo browser. Since I want to tag more than the file I uploaded I opted for manually tagging the photos in DigiKam with the iNat observation Id to allow importing to start. Also, just because of how DigiKam works, I have to click “read metadata from file” to get the new info into DigiKam after importing. But those are the only manual parts, the program knows what to import automatically by using the updated_since parameter and keeping track of the last time I ran it.
I use the API to get my data, and it is pretty good. The Swagger they provide doesn’t include annotations, but I was able to get an AI to add it to the Swagger without issue by showing it the full response body of an annotated observation. To reduce the traffic I create I made sure all my API calls are as efficient as possible, always requesting as many observations and taxa as I am allowed in a single request and avoiding requesting the same thing more than once.
After I finish my importing program, I’ve been thinking about then using the information I put into the photos to populate a website much like how you want to. But I don’t have any concrete plans for that yet.
I use digiKam as well! I don’t do any automatic importing of data, though; I manage the digiKam data manually. I don’t add the observation ID to the metadata, mostly because it seems like it would be a hassle to do. I should still be able to semi-automatically match photos in my digiKam library to iNat observations, using date/time, location, and taxon. If there was an easier way to add the iNaturalist observation ID to an item in digiKam (without modifying the image on disk), I would likely be doing that. Maybe I could modify an XMP sidecar rather than the image file?
I thought about ways to automatically associate my files to an observation, but there was just too many cases where it wouldn’t work, such as situations where I am going back and forth between several individuals on a single plant as they move around. In DigiKam the images I would upload for each observation end up all interweaved with one another by the order I took them. That, and the fact that I usually upload cropped photos either to make it look nicer or to make it easier to identify, and I want to have those, the originals, and ones I did not upload to all be tagged. And the main point of this program was to automate tagging the taxa information so I wouldn’t be able to use that to try and match either.
My method for adding the observation ID to the image in DigiKam was to just add “obs:1234”, with 1234 being the id, to the caption. Then I coded my program to find that bit of text anywhere in the caption even if I also use the caption to write a description. I also made it so I can use commas like obs:1234,5678 to add information from multiple observations since some images have multiple species in them.
DigiKam doesn’t modify the image metadata by default so you can set it up to do sidecars in the settings and then manually write to disk or set it to write automatically and it will do sidecars instead of modifying the image itself. I went with modifying the image since it is just easier, and I’d have to read the metadata anyways to get the created date and time. Speaking of that I finished it, here is a link if anyone would find it useful, I wrote it in my favorite obscure language so probably not: https://codeberg.org/SkeletonEntity/iNaturalistTaxonomyImporter
Thanks for your help. After I get the software with basic functionality I will look into getting information from the API. It will be a bit more challenging since I am using VB.Net. I’d may have to send command to python from VB.Net or find a more direct approach.
Currently I am working on a setup screen which gets info such as the folders of the original files, the destination folder, CSV file locations, and various filters. I have a ways to go…
I have heard of DigiKam and at one time wanted it as my photo organization program. But I found it rather slow and I have been using Windows Live Photo Gallery. An old program that also supports hierarchical tagging. I use that for uploading to iNaturalist.