Bulk download of taxon photos showing flowers

Greetings! I’m working on creating a dataset of plants and their significant colors, particularly flower colors. My plan is to collect representative photos for each species belonging to a taxonomic group (Angiospermae, for example), then use an image processing pipeline to extract the colors from each image.

I’ve used the iNaturalist API a fair amount in other phases of this project, but I did not want to unnecessarily burden the servers with tens of thousands of requests if there is a simpler way to achieve this goal. I started by downloading the iNaturalist Open Data from AWS S3 and have been exploring using Postgres. The open data is sufficient to allow me to construct a list of photo URLs for each of my target taxa, but not to ensure that those photos actually contain views of each taxon’s flowers. Each taxon may flower at different times of the year, so I would need to tailor the filter individually. I don’t want to waste storage or time downloading a bunch of photos from the wrong time of year!

That brings me back to using the API to filter photos for each taxon based on the seasonal timing (phenology) of its flowering. I think I see two ways to do this currently:

  1. Fetch /observations for each taxon and provide the correct controlled term ID for “Flowers”. Use the photo URLs from these results directly, since they should contain at least one flower.
  2. Fetch /observations/histogram for each taxon with the controlled term ID for “Flowers”, then use the dates to filter the offline open data for observations of that taxon during that likely-flowering period and hope to get lucky with photos that actually contain flowers.

I’m interested in collecting this information for about 180,000 taxa, so I don’t want to degrade the iNaturalist service by wasting compute going about this inefficiently. I also don’t want to get myself rate limited or outright blocked.

I would greatly appreciate advice from other developers or the iNaturalist team as to the best approach to take here. Thanks!

if you’re going to get flower color, i would assume you need something that recognizes what a flower is.

so i would still start from the AWS data set, download the whole set of photos by relevant taxon, and then as a first step, run your flower detection model to find the flowers in the images.

people don’t add annotations consistently enough to observations that i would rely on these, and seasonality isn’t always reliable either.

2 Likes