Species Accumulation Curves for iNat data

I just wanted to point out that it is also a problem within parks, if those parks comprise multiple communities. That’s at least as much of a problem as variation in density of species, total sampling area, or sampling effort, if not worse in terms of biasing the results.

And within continents
https://theconversation.com/itll-take-150-years-to-map-africas-biodiversity-at-the-current-rate-we-cant-protect-what-we-dont-know-195219

But we found in a recent study that huge swathes of Africa remain unstudied and their species undocumented. Why? Because scientists keep returning to areas whose biodiversity has already been mapped, rather than visiting new, unexplored areas.

There’s also a need for scientists to engage with each other beyond borders. Biological sampling in Africa has, to a large extent, been carried out by European and North American institutions. Researchers from institutions in those regions need to collaborate with local universities, rather than just using locals as field assistants.

I believe iNat is making a difference there.

1 Like

A quick look at the maps in that article suggest large parts of Africa have the same problem we have here in Canada. Most biodiversity research/observations occur within 2km of a road. There are huge parts of Canada, especially in the North, that have very few or no roads.

3 Likes

As nice as Diana’s idea sounds, Kevin’s observation seems to suggest otherwise.

We do our best to go out of the roads though. But in many places just following the road would be enough.

1 Like

I mostly agree with you. As I said in the original post, species accumulation curve methods are based on all sorts of assumptions that iNat data violate.

My question is, in the park where I work, and run the iNat project,

Saying that we would end up with more than we have now is certainly true, and we can’t generate a terribly precise estimate, but I do think we can do better than just “more.”

3 Likes

But Kevin and I are agreeing, observers need to get out of their comfort zone and explore new places.

Or help to ID for the observers who HAVE gone to new places.
iNat is adding new species every month. Maybe not so much new to science, but definitely iNat firsts.

And new SPACES. I was very surprised in just my first 4 months after getting more into macro and tipping into extreme macro territory, to suddenly start adding so many regional firsts to my list. As you know, rarely observed does not equal rarely present.

As macro photo tools become more accessible (particularly with improvements in smartphone cameras) I can predict a corresponding disturbance in the force of the species-totals observation curves.

(Try to read that last sentence using your interior James Earl Jones voice.)

4 Likes

Here are a couple of practical solutions that I think are the most promising, although there really isn’t a silver bullet to this vexing issue.

The first approach, and perhaps the most helpful, would be to build stacked species distribution models for all the plants in the regional flora. You would need spatial records for all the species that could possibly be in your park and some good spatial environmental layers. The estimated potential species richness of your park will be the sum of the probabilities that each regional species occurs in your park. Besides just richness this will also produce other useful outputs. For instance, it will give you a list of species that would be most likely to occur in your park based on the environment in the park. Then you can compare these to your current species list to see whats missing. It will also produce maps of where new species are most likely to be found in your park and so give you a better sense of where to look for those species. You can use the R SSDM package to build stacked species distribution models here (even has a gui):

https://besjournals.onlinelibrary.wiley.com/share/MCVZMJUX5UNNE9HRTXTP?target=10.1111/2041-210X.12841

https://cran.r-project.org/web/packages/SSDM/index.html

If your park is in California, this approach would be similar to using Calflora to generate a likely species checklist–I think there’s a tool for that in What Grows Here? https://www.calflora.org/entry/wgh.html

However, I think this approach tends to get less accurate at finer spatial resolutions and is sensitive to bias in the occurrence probabilities. My guess is that once you’ve looked at the predictions you may be able to refine these based on expert knowledge of the neighboring areas and come up with a reasonable estimate. I think this is a good approach for you situation because with nearly 800 species there probably aren’t too many species left in the regional flora that have not been found in the park (although I’m not sure).

The other approach is to extrapolate from the data collected in the park. The approach I would take would be to define a sampling unit from the iNat data. For instance, lump all observations taken during one calendar day by one observer (or observation party) as a “sample”. Then for each species count the number samples in which it is present. You should have a few species present in a lot of samples and a larger number of species present in only one or two samples. The iNEXT package in R has some functions that would allow you to estimate how sensitive the richness estimate is to additional sampling effort. The problem here is that the sampling behavior of iNaturalist observers differs from the sampling behavior assumed by the models in iNext. For instance, iNat observers seek out rare species in samples and probably skip common species. I still think you might get a reasonable estimate from this however. Here are a couple of papers that look at this problem using herbarium records, which present similar issues (maybe there’s something useful there):

https://onlinelibrary.wiley.com/share/ZHQTBHHSYHITEZZNYMGV?target=10.1111/j.1654-1103.2010.01247.x

https://bsapubs.onlinelibrary.wiley.com/share/YMDXUWKTUG5C5D8JMVFA?target=10.3732/ajb.1000215

Hopefully that’s helpful, if you and @graysquirrel want to explore this more let me know. Given your extensive sampling in the park it might be fun to make a case study out of this.

Let me know what you think

5 Likes

Using statistics is wonderful and feels ever-so-scientific, but I’m wondering if this particular problem is more easily solved by using what I’d call best professional judgement.

I’m assuming the park is in California or, at least, some other well-studied large region in North America. Currently, you have 800 plant species on your park’s list (just vascular plants?). Calflora says it has 8,000 plants (I assume they mean species). So, the question becomes which of the 7,200 species that aren’t currently on your list species are fairly likely to show up in your park.

Then you assemble a team of botanical experts familiar with your larger region (or just you and graysquirrel). You can use the iNat-based Easily Missed tool and Calflora’s What Grows Here tool to generate lists of what grows nearby/in similar habitats. Then it’s brute force - maybe a day or two of pleasurable perusal - to eliminate species out of the 7,200 that aren’t likely to grow in the park. No ocean in/next to your park? Thus, no marine plants. No true desert/high alpine/lakes/ponds/rivers in your park? No desert/alpine/submerged aquatic plants. You have a rather nice big bog, you say? Pile those bog plants into the more likely category. Some of the 7,200 are waifs or adventive? Not terribly likely to be in your park.

This brute force, er, best professional judgment method has the advantage of acquainting you and graysquirrel (and any others on your botanical team) with the most likely suspects still to be documented and thus giving you clues about where and when to search, not to mention what kinds of experts to invite to help out.

It would be interesting to compare this brute force method with methods based on sampling and statistics, both in terms of how much effort it takes to generate an estimate and how accurate each method proves after five or ten years of field effort.

6 Likes

Yes. Great point! This is really the only way to test ideas about unobserved species–we need to make predictions on how many species will be found X years into the future and then wait until then and evaluate our predictions. Since we can never really know how many unobserved species there are in a region, this is really the best way to test our scientific models or expert opinion.

2 Likes

Just a reminder to keep the conversation focused on OP’s original question which is about species accumulation curves in a specific area. We also have a conversation that has diverged a bit to discussing the merits/downsides of users returning to the same areas and/or observing on roadsides. It’s a valid topic, but would be better served with a separate thread if folks want to continue that line of discussion - if so, let us know and we’ll split it out.

3 Likes

Seems to me that these tools deflate the idea that we are making real contributions to these well-sampled areas. “Yep, what I saw is already known to be here.” And as to the question of species accumulation curves – should we really expect it to be very different from the relevant lists produced with those tools?

1 Like

Good point, good question. Maybe it comes down to why does someone want to know how many and which species could reasonably be expected to be in a particular area?

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.