One thing the new Discord API gives us is support for autocomplete, and luckily, the iNat API even comes with endpoints specifically designed with autocomplete in mind. Unluckily, I can’t figure out a reasonable way to do autocomplete for our whole Discord bot user population, stay within iNaturalist API limits, and still be able to modestly scale up over the next few years.
While hitting the /v1/taxa/autocomplete
endpoint repeatedly as a single user only generates a handful of requests per lookup, that’s not so great if your app handles requests for multiple users. As a specific example of the single user scenario, a web app could serve a multi-user population without cumulative effects of all of their autocomplete requests piling up and going over the limit, as each API request would come from each user’s own browser. I imagine this is the sort of use case the designers had in mind. However, Discord bots are a different animal …
Dronefly Discord bot runs on a single Linux host at home, and currently serves a modest sized user population of hundreds of active users, generating up to 1,000 requests a day. I estimate on our busiest days, 500 - 600 of those are taxon name lookups. This is a lot more volume than the single-user scenario! Given these numbers, I think it is reasonable to plan to scale gradually upwards to between 1,000 and 2,000 taxon name lookups a day. If we need to be able to handle that kind of load, and we estimate about 5 autocomplete requests per lookup, then that’s 5,000 to 10,000 requests total, just for autocomplete alone! Clearly that’s not going to scale.
I could use the monthly iNaturalist data dump, but since some of our keenest users are actively involved in improving the iNat data, they would be likely to be bothered by discrepancies due to autocomplete working off of stale data. But that’s not the least of our problems, as the dump files seem to only contain scientific names, not also the common names that many users expect to be able to look up.
To overcome the staleness problem and common names problem, I’ve toyed with the idea of downloading the whole taxonomy using the /v1/taxa
endpoint over a couple of days, working out to about 1,300 requests a day, and leaving a generous 8,700 out of the 10,000 daily rate limit maximum to do other sorts of requests. With all of the names stored locally, our capacity to handle autocomplete can scale up without any additional API requests. It’s doable, but not only is it a lot of overhead for very little gain for the numbers we are handling right now, but it is also a lot of extra work for me to set up. Additionally, it won’t return the exact same choices that the inaturalist.org webapp does, since at this scale, I don’t think replicating iNat’s whole elasticsearch setup is practical. So at least for now, this is not looking like my best option.
Finally, all of this back-of-napkin figuring is based on some guesses that may, after all, turn out to be off by a large factor. So before I’d embark on anything this ambitious, I’d need to get some real numbers out of the present system (e.g. collect some stats in an autocomplete callback, but don’t actually provide autocomplete capability). I thought as well it would be a good idea to ask the iNat dev community. I mean, without a breakthrough here, I’m seriously just considering scrapping this whole plan, or at least scaling it back to smaller subproblems where some sort of autocomplete would still give us some benefit, but without such a huge API cost.
So, any ideas?