This might seem like a RTFM question, but I’d like to make sure that what I’m doing is ok, before I publicise it.
I’ve made a page on my site that takes a location and tells you what plants have been seen locally in the past couple of weeks. What it also does is query the past six years for observations in the next two weeks for plants that get photographed around this time of year but haven’t been recorded this year yet on iNaturalist, so you can see what might be visible next time you’re out and about.
This requires multiple API calls, one for this year and one each for the six previous years, so I space out the API calls making the search slower but not ridiculously so.
I also check to see if the type photo for a species has a compatible licence or not. If it’s all rights reserved or ND, then the script looks for the top-rated photo with a compatible licence. This takes extra API calls, and these are spaced out too.
Is this compliant with the API? I’ve had to put it live at https://www.botany.one/hot-botany/ so anyone interested can see what it’s doing (I couldn’t get a more limited preview link to work).
I think the thing that’s puzzling me is the line “The API is intended to support application development, not data scraping.“ I’m pulling in some data to do this, but it has to be live data to have any use.
I’m also working on a similar Welsh-language site. As that works across more taxa, I was planning query to the API once a week and store the results as a weekly cache. This is giving me a prompt to add Welsh names to the site where they’re missing.
Happy to hear comments and advice. The code is JavaScript, so ‘view source’ will reveal it, if anyone wants to copy or adapt it.
i think the point here is that they don’t want folks using the API to download millions of observations or to download the entire taxonomy, since those would take many thousands, tens of thousands, or more requests to accomplish.
you’re downloading aggregated observation count by species, and you’re only getting one page per year. so i don’t think you’re doing something that’s outside of the intended use of the API. i wouldn’t even bother trying to cache this sort of data on my own server.
This looks like a fun tool! As long as your delay between queries is at least 1 second, it sounds like you are complying with the API Recommended Practices. However, if your tool becomes popular so that more than one person is using it at a time (i.e. in parallel) then it will likely exceed the 1 query per second recommendation, and may run into the 10 000 queries per day limit. Since it’s javascript and being run in-browser, these limits should apply separately to each user, but it could still wind up being a load on the iNat servers – I would guess minor compared to the direct traffic on iNat.
Thanks for the responses. I’ll look at slowing it down a fraction. To be honest, I’m expecting half the visitors to the page in any month to be me, but there’s the potential to be a problem if it does prove popular.
I believe this is generally tracked at the client level, since the clients are making the requests. so if we’re both running the same thing, you get 1 request per second, and I get a separate 1 request per second. the only way additional users might become problematic is if they’re sharing the same connection somehow or if alunsalt’s website’s server is making the requests on behalf of the site’s users, which I don’t think is the case.
I generally wouldn’t worry too much about a third party website driving up additional traffic to iNaturalist via API. this particular site is already doing things as efficiently as possible probably, and I can’t image that many users using it to drive up traffic significantly, even if it is a useful tool.
This is a fun tool! But I tried it for my hometown Antequera (Spain) and it seems not to count everything. For example I have two observations of Cynoglossum cheirifolium this year in March and it says it hasn’t been observed yet.
I had hoped the link “view on iNaturalist” brings me to the mentioned observations, but it only goes to the taxon page.
The issue is that the predicted species list is built by subtracting this year’s top 10 species (default is 10 species, can also choose 20 or 30) from the historic species list. So if C. cheirifolium is the 12th top species for this year and you’re using the default of 10 top species, then the page thinks that it’s unseen for this year. If you make the radius small enough and/or expand to 20 or 30 species, you can get C. cheirifolium to appear in this year’s seen list instead of the predicted list.
Probably the page should expand the definition of “seen this year” beyond the top 10/20/30 species.
Being photographed now
The top N species observed at the location
Targets to look for
These are things in the top 200 observations that aren’t in the top N, sorted by historical observations for the next X weeks. These should be things that are out now, but in their earliest phases of display.
Predicted
These are species in the top 200 in historical data without observations this year.
If a species has observations, it will link to a search page with the relevant location and search radius, following Susanne’s suggestion.
I’ve batched searches where possible. There are some taxa like Scilla forbesii where all the suggested photos are copyright, despite there being CC licenced images on site. These have to be pulled separately, but they should be rare.
I’ve also fixed the CC 0 crediting error. I’d seen it in the Welsh page, but hadn’t fixed it for the English site. I’ve also excluded ND licenced images, to make sure I’m not breaking that licence.
Thanks to everyone for their comments, it’s now a lot better than it was.