Species Accumulation Curves for iNat data

Here are a couple of practical solutions that I think are the most promising, although there really isn’t a silver bullet to this vexing issue.

The first approach, and perhaps the most helpful, would be to build stacked species distribution models for all the plants in the regional flora. You would need spatial records for all the species that could possibly be in your park and some good spatial environmental layers. The estimated potential species richness of your park will be the sum of the probabilities that each regional species occurs in your park. Besides just richness this will also produce other useful outputs. For instance, it will give you a list of species that would be most likely to occur in your park based on the environment in the park. Then you can compare these to your current species list to see whats missing. It will also produce maps of where new species are most likely to be found in your park and so give you a better sense of where to look for those species. You can use the R SSDM package to build stacked species distribution models here (even has a gui):

https://besjournals.onlinelibrary.wiley.com/share/MCVZMJUX5UNNE9HRTXTP?target=10.1111/2041-210X.12841

https://cran.r-project.org/web/packages/SSDM/index.html

If your park is in California, this approach would be similar to using Calflora to generate a likely species checklist–I think there’s a tool for that in What Grows Here? https://www.calflora.org/entry/wgh.html

However, I think this approach tends to get less accurate at finer spatial resolutions and is sensitive to bias in the occurrence probabilities. My guess is that once you’ve looked at the predictions you may be able to refine these based on expert knowledge of the neighboring areas and come up with a reasonable estimate. I think this is a good approach for you situation because with nearly 800 species there probably aren’t too many species left in the regional flora that have not been found in the park (although I’m not sure).

The other approach is to extrapolate from the data collected in the park. The approach I would take would be to define a sampling unit from the iNat data. For instance, lump all observations taken during one calendar day by one observer (or observation party) as a “sample”. Then for each species count the number samples in which it is present. You should have a few species present in a lot of samples and a larger number of species present in only one or two samples. The iNEXT package in R has some functions that would allow you to estimate how sensitive the richness estimate is to additional sampling effort. The problem here is that the sampling behavior of iNaturalist observers differs from the sampling behavior assumed by the models in iNext. For instance, iNat observers seek out rare species in samples and probably skip common species. I still think you might get a reasonable estimate from this however. Here are a couple of papers that look at this problem using herbarium records, which present similar issues (maybe there’s something useful there):

https://onlinelibrary.wiley.com/share/ZHQTBHHSYHITEZZNYMGV?target=10.1111/j.1654-1103.2010.01247.x

https://bsapubs.onlinelibrary.wiley.com/share/YMDXUWKTUG5C5D8JMVFA?target=10.3732/ajb.1000215

Hopefully that’s helpful, if you and @graysquirrel want to explore this more let me know. Given your extensive sampling in the park it might be fun to make a case study out of this.

Let me know what you think

5 Likes