Easiest way to rank areas by biodiversity using iNaturalist data

Dear iNaturalists,

What is the most user-friendly way to examine an index/marker/metric of biodiversity for an area, and consequently to perhaps have a graph such as a heatmap to be able to compare between biodiversity rankings of different geographical reas.

Is there any such tool developed that is easily accessible, and that directly uses iNaturalist data? For example if someone developed an easily accessible tool which uses the iNaturalist APIs.

Or is there perhaps a native iNaturalist page dedicated to this subject?

If not easily accessible, then what is one of the most comprehensive ways to achieve the same? For example using the API and R.

Note: regardless of each person’s specific purpose(s) on iNaturalist / with iNaturalist data, I think we should all be concerned about biodiversity, and it would be great if iNaturalist could serve there.

Kind regards,
Vincent Verheyen

2 Likes

Hello! This is a very interesting question and definitely something that is possible, and probably that has been done before.

EDIT: Sorry, I probably should have looked at your profile before writing this! I apologize if any of it sounds condescending or something you are already aware of. If you’re interested in brainstorming some more specific solutions, please LMK.

However, measuring or estimating biodiversity is a very complicated issue.

For starters, there are actually multiple kinds and components of biodiversity. Generally the easiest to understand is species richness vs species evenness. Just because two different areas have the same number of species, does not mean they have equal biodiversity. The one with a more balanced or even distribution of species, instead of one dominating, is more biodiverse. A single “biodiversity value” can be produced using various indexes, notably Shannon’s index.

Furthermore, there are even multiple different kinds of species richness metrics, referred to as alpha, beta, and gamma diversity. These are most relevant in the context of surveys occurring on different scales. Alpha is the number of species within a particular ecosystem or area. Beta is the number of unique species compared between two or more plots or areas. Gamma is the number of species in total across multiple plots (basically just alpha with a larger area, I think, someone correct me if I am wrong). Beta is most interesting because it can tell you which areas are the most “significant” to preserving biodiversity, i.e. those containing species likely not found elsewhere.

Probably most significantly, this issue is complicated by the very large bias of iNaturalist’s database. The organisms that get observed the most are those which are interesting and easy to identify. Furthermore, the frequency of observations is highly concentrated in populated areas, in parks, and along trails. Many large areas have very few observations, even in populated countries like the US. There are ways to account for these, but they require advanced statistics and knowledge of the originating bias.

Generally speaking, however, average gamma biodiversity is relatively predicatable; it increases as you get closer to the equator or are lower in elevation, are higher in temperature, precipitation/available water (think wetlands), or sunlight. However, this does NOT show you the LOCAL areas which are the most unique (highest beta diversity), and which arguably represent the largest loss if they are destroyed. This depends on many more factors, including microclimate, land use history, structural diversity, and more.

EDIT 2: I also forgot to think about genetic diversity and ecosystem diversity, but those are much, much harder to extract from iNaturalist data and probably not what you were referring to anyway, although they are also very significant.

I study Conservation Biology as an undergrad. The topic of biodiversity conservation is soooo complex and nuanced, and there is a TON of potential for creativity and new solutions. It’s not so simple as “pick the ‘most biodiverse area’ and make it a park.” Without active engagement, anyway, biodiversity is rarely “maximized,” and may even decline over time. Intentional management, even if it seems detrimental, can be very beneficial to biodiversity (such as forest thinning, clearing patches, or conducting prescribed burns). The most surprising areas, even in cities, also have enormous potential for conserving biodiversity. iNaturalist, indeed, can highlight this oftentimes!

7 Likes

Hi @son_of_wasps,

In fact I am not very well read on the topic at all, so I very much appreciate your great overview, which is very helpful.

With that initial allegorical landscape of some possible definitions, remains the question: where are the easiest available tools to examine such analyses, based on iNaturalist data (or not), with whichever such metrics.

It seems to me that, making such data analyses accessible and easy to use, integrated into iNaturalist, would be a great enhancement when it comes to science popularisation and consciousness of biodiversity.

Kind regards,
Vincent Verheyen

1 Like

It is also going to be biased by the particular set of observers in an area and their interests and activity level. At a local or state level (presumably the level at which one will usually want to be measuring biodiversity), this is often going to have a fairly significant impact on the amount and distribution of observations.

E.g., there’s an umbrella project for wild organisms found in a number of botanical gardens in Europe. A comparison of the number of species in each garden would suggest that the top 6 or so gardens are far more biodiverse than the ones lower on the list – when in fact, these gardens have simply been much more intensively observed, often including observers who have specialized training (say, in some arthropod group or another) and who have been collecting specimens for more precise determination rather than just relying on field photos. Are some of these gardens more diverse than others? Almost certainly. Size, age of garden, type of botanical collection, geographic location, and location relative to other urban green spaces will also play a role. But the differences are almost certainly not as vast as the species lists for the different gardens suggest.

Before one could even begin to try to estimate and compare biodiversity for these gardens (whatever definition of biodiversity one is using), one would need to figure out what biases exist and how they could be controlled for in one’s analysis. This is likely going to be different in every location and for every research question.

So I find it difficult to imagine what tools iNaturalist could provide to “make such data analyses accessible and easy to use” when there is not going to be a single, standard process for examining any set of data. INat already has a variety of tools (e.g., the Explore page with all its various search filters, projects) that allow for easily accessing and sorting the raw data, but it seems to me that analysis is always going to be up to the researcher who wants to use that data.

4 Likes

I think providing such a tool nevertheless, with some explanatory notes, would be great incentive for locals to try and observe and identify more species in their local garden, including pushing people towards species-level identifiable observations.

It could also push authorities to keeping their spaces more biodiverse. Part of the goal of iNaturalist is making science and biology accessible to all, rather than keeping it in peer-reviewed articles.

I think providing such (multiple; preferrably) easily accessible metrics would be a great addition. There is no need necessarily to provide only a single biodiversity metric.

There could be some sliders for the user to increase or decrease the influence of certain parameters.

1 Like

And what would such a tool look like? What exactly do you think it would measure and how would it work? How would you determine what relevant parameters are and whether they are even recorded in iNat data?

@son_of_wasps mentioned the Shannon’s index as one notable metric. As for the different options (metrics and/or parameters) and explanatory notes of limitations to incorporate into the User Interface, I am not well read, so I currently leave it up for discussion here and to the experts in the field to decide. However, it feels natural to me that ‘time’ would be an interesting slider-parameter.

As a software specialist, I have a strong belief that it would be possible to create such a tool, even though (gauging from the current reactions) it does not seem to exist at the moment. And as a human, with declines in biodiversity affecting, for example our agricultural produce in so many parts of the world, it seems highly important to me. As a lover of iNaturalist, it would be great if it could be incorporated on the website, but if there are any external tools already available, that would be of interest as well.

One more obvious and big bias (atleast for my area) would be unidentified taxa. if one area has lack of experts or complex taxa groups so not IDed beyond higher levels - that in itself will get skewed results.

maybe it can be controlled if the national checklists are integrated into such measure to balance those scores somehow carefully. its hard thing to control those lists distribution further as they too can have regional niches. and there is observer bias as mentioned - not only number of observers for area but also the taxa those observers inturn are biased towards observing.

The better approach is to integrate geographical terrain data and weigh that heavily in scores - as some environments by default should have more diversity than others even if not documented/observed.

1 Like

You are the one suggesting that it should be possible to quantify biodiversity using iNat observations, so it seems to me that it behooves you to read up and consider the how of what you are proposing. E.g., if you have never heard of Shannon’s Index until today, why do you believe that it would be feasible to apply this equation to iNat’s data?

The main take-away messages of the responses you have gotten so far should be:

  1. It is difficult to meaningfully measure biodiversity even if your data is collected systematically using consistent protocols, because there are different ways of defining biodiversity.

  2. iNat data is not collected systematically or with any sort of consistency – that is, it is prone to all sorts of biases. Any algorithm for calculating biodiversity will not provide accurate results if you do not first identify what these biases are and how to correct for them. These biases may not be something that can be easily determined from the dataset alone, because it may require knowledge about things like where streets and public transit are located, or what sort of training observers have. iNat does not and cannot record this information.

1 Like

@spiphany Shannon’s Index within an area seems pretty straight forward (I understand it is usually calculated for a communiity, but I don’t see why it couldn’t be for an area). It only takes very few variables, and all of the required information is available on iNaturalist (within the bounds of the data available on iNaturalist … as mentioned … of course): number of species in an area, number of individuals of species x in an area, total number of individuals in an area, from which we can, by division, calculate the proportion of species x in an area.

In terms of computing power, which is not something we have discussed so far, it would require some estimation whether or not iNaturalist has the resources for these calculations. But this should not necessarily be a problem, as calculations can be done on a much less frequent periodic basis, they do not need to be re-calculated in real-time, if too expensive for the non-profit. However, I think providing the could also yield additional value and thus donations/funding, which could also be estimated when analysing feasability.

@spiphany I am flattered that you suggest I should be the one to propose exactly which metrics and parameters could be interesting in terms of biodiversity. I will do reading for sure, because this interests me greatly, but I believe we are stronger if we work together. My post here is categorised as a question for this very reason. I think we have a lot of experts in the field here, who could equally make some proposals.

There might be some experts in the field who will reply that such or such tool already exists which uses one of the iNaturalist APIs. Which was equally one of my questions.

Let’s take the simple formula of the Shannon’s Index. Do you see any variable which we are lacking? Or do you agree that we have all necessary variables in order to calculate the index with iNaturalist data (of course … with the limitations of the bias of the data-points)? If so, we already have one possible metric (from my side, I am not saying it needs to be the Shannon’s Index that we incorporate, just mentioning it as an example as it was mentioned as notable by others above).

2. iNat data is not collected systematically or with any sort of consistency – that is, it is prone to all sorts of biases. Any algorithm for calculating biodiversity will not provide accurate results if you do not first identify what these biases are and how to correct for them. These biases may not be something that can be easily determined from the dataset alone, because it may require knowledge about things like where streets and public transit are located, or what sort of training observers have. iNat does not and cannot record this information.

Regarding your point 2. (and equally towards the comment above by @einsum). → While these are all interesting remarks; it is not a problem, in my view, that results will be biased due to the various factors influencing what type of observations/identifications exist on the platform where and at which level.

Even if the tool / metric(s) will be somewhat naive, and with obvious limitations because of the data-set, it’s better to have some visibility rather than none at all.

1. It is difficult to meaningfully measure biodiversity even if your data is collected systematically using consistent protocols, because there are different ways of defining biodiversity.

Regarding your point 1. → As outlined, there are different metrics to describe biodiversity. I would say it makes it much more interesting and it suggests including a dynamic feature, i.e. an option to switch between metrics within the tool.

Regarding all of the above, I am reminded of other functionalities of iNaturalist. When it comes to taxonomy, for example, iNaturalist by default counts the taxa represented by these observations using ‘leaf count’. For the API, for example this has implications as it disregards subspecies/infraspecies (I have had some discussions on this with @elias105 and @pisum). So even though iNaturalist in general (taxonomy is a core business I would say) makes some choices in the back-end to default to a certain metric, and limits its API currently in this way for example … nevertheless, on the analysis/reporting side of things, iNaturalist still allows for the user to choose many different ways to choose a different metric to count. This is also explained in the last sentence of the help page How does iNaturalist count taxa? (modified on Thu, 1 Aug, 2024):

”If you’d prefer the species count rather than the leaf count on the Species tab in Explore, use the Rank control in the Filters menu and set High to Species”.

Equally, the user currently has the option there to choose to switch the metric to include for example subspecies, if desired, in their reporting/analysis.

We are here not proposing to change the entire way iNaturalist works in the back-end. It is simply a matter of reporting/analysis. And so I do not see any reason why there could not be multiple metrics (options) implemented, for the user to choose from. Analogous to what was done for the filter options with the example regarding taxonomy:

  • There are different ways to represent taxonomy
  • → It makes it more difficult
  • → iNaturalist proposes a default setting, and at the same time allows the users to filter/change to other metrics via options

Does that mean there are only 540259 species in The World, as the page https://www.inaturalist.org/observations?lrank=species&place_id=any&view=species suggests? No, it does not of course. The data, as mentioned, is skewed/limited/biased. It only shows us what is available based on the data-set of iNaturalist.

Another example, with regards to limitations/skewing/bias: I made an observation of a grain mite. iNaturalist shows me that there are only 32 total observations of this species in The World. Does that mean that nobody else observed/identified grain mites outside these 32 observations? No, of course not. Does that mean that this is a very rare species in the real world? No, of course not. Obviously, the data will be skewed. But that does not make it less interesting to start capturing and showing metrics within the ever-changing and growing data-set. Otherwise, we could make the case against iNaturalist as a platform as a whole, which I wouldn’t be surprised if it was the case in the early beginnings, from some academics (or maybe still!).

Of course there can be political reasons why people are against providing easily accessible biodiversity metrics. But intrinsically, I would find it a very interesting exploratory feature, and a very important topic for science popularisation.

1 Like

iNaturalist doesn’t store abundance data, so you can’t say how many individuals of species x there are, never mind the number of individuals in the whole community being studied. There would be problems deciding which observations are within the study area, given the wide range of ‘accuracy’ circles observers put around their observations. Also it sounds like another step towards gamification: people will inevitably want to increase the index value for their favourite area so it would create an incentive for adding species records beyond the taxonomic expertise available. So I say proceed only with extreme caution.

2 Likes

I used to work at a California State Park, and trained park volunteers and visitor to use iNat, and also invited lots of naturalists to visit the park. We went out of our way to document as many species as we could in the park, now over 2200 species. iNat data now makes that park appear to be much more biodiverse than the areas around it, but that is likely entirely because of differences in sampling.

I am extremely skeptical that iNat data can be used to meaningfully compare the actual biodiversity of different areas with different sampling.

5 Likes

I think it would be a mistake to use iNaturalist data to “rank areas”, unless there is very careful consideration of what the purpose is and how to avoid misuse of such a ranking.

I am a well-published conservation biologist and have reads hundreds of academic papers where the authors calculate diversity metrics, usually with very little appreciation of what they are doing, and frequently with dangerously misleading results. From a biodiversity conservation perspective, which these authors typically claim to adopt, it is rarely of any relevance at all to compare species richness, even if adjusted for sampling effort. It is rarely of any relevance at all to calculate Shannon or other diversity indices. Yet they do it anyway.

What is generally more relevant is to understand the contribution that some area or land-use type makes to supporting the totality of global biodiversity. For example, if a small area supports the only population of a micro-endemic species, it is irreplaceably important, even if it has far fewer species or lower diversity than other areas. If a land-use type supports good populations of biome-restricted or endangered species, it is probably more important to protect it than one which supports only a set of more widespread or non-native species. There is also an enormous difference between a thriving population of a species, and the occasional vagrant occurrence, but data such as iNaturalist data are not really suitable for distinguishing these situations.

Diversity metrics don’t capture those sorts of nuances. You need to look at the data with some understanding of what you are looking at and why. iNaturalist data are fantastic, but this is not an application they are much use for.

There is already a plethora of maps which attempt to identify areas of greatest biodiversity value for different purposes and at different scales, such as the Biodiversity Hotspots and Key Biodiversity Areas. They each have their strengths and weaknesses, and as time goes on, iNaturalist will increasingly be one of the datasets used to inform these prioritisations, but not iNaturalist data in isolation.

10 Likes

This point is really important. Possibly more important than perfecting the accuracy of the tool. But this assumes we intend to apply science rather than simply conduct it.

2 Likes

@jhbratton Some thoughts with regards to the points you have raised. I quote:

iNaturalist doesn’t store abundance data, so you can’t say how many individuals of species x there are, never mind the number of individuals in the whole community being studied.

Please see a counter-example in Observations iNaturalist > Species: Aedes aegypti - Location: Brazil - Filter: Show: Research Grade where the amount of (research grade, for example) number of observed individuals of species x (Aedes aegypti) is shown in an area y (Brazil). There are 394 such observations.

I quote:

There would be problems deciding which observations are within the study area, given the wide range of ‘accuracy’ circles observers put around their observations.

See above. To be able to determine the number of observed individuals of species x (Aedes aegypti) is shown in an area y (Brazil), the same question could be asked with regards to the ‘accuracy’ circles, yet iNaturalist was able to make some sensible decisions on the same.

I quote:

Also it sounds like another step towards gamification: people will inevitably want to increase the index value for their favourite area so it would create an incentive for adding species records beyond the taxonomic expertise available. So I say proceed only with extreme caution.

Fair enough. However, we also have leaderboards currently on iNaturalist of for example observations, identifications, curators,… I am not making a careful assesment here, but such gamification could equally be helpful and stimulative. It is what iNaturalist chose to do in the leaderboard examples and metrics mentioned above. I like the Wikipedia policy ‘assume good faith’. So, while not invalidating your point, I’d propose this is more a general question with regards to validating the crowdsourced work on the platform / checks and balances / data quality assesments, about for example what to count as ‘Research Grade’. Lastly, I think, your feeling illustrates that the biodiversity metric(s) and tool which we propose for the iNaturalist data-set would actually be meaningful to certain actors and create incentives.

@dlevitis Regardless of whether or not it reflects the what we here seem to want to call ‘actual’ biodiversity in the area, out in the real world beyond iNaturalist, I have a question. Would any biodiversity metric on the iNaturalist data-set tell us something about the diversity within (y)our iNaturalist observations? Yes it does by its definition. So then the question is: is that meaningful to you? I’d say, that even merely as a proxy educational tool to raise awareness about biodiversity in general, it could be useful. And it all depends on the metrics(s) implemented, whether or not we will find it meaningful.

Whether or not the data from iNaturalist can be used to give us some indications about the use of the platform alone (for example a spike in biodiversity due to a micro-organism research laboratory in the area with a iNaturalist enthousiast and/or mandatory policy to publish observations on the site), or whether it can tell us something about what we want to call ‘actual’ biodiversity here (but which we did not clearly define), I will let myself be inspired by experts such as @deboas seems to be.

I am a well-published conservation biologist and have reads hundreds of academic papers where the authors calculate diversity metrics, usually with very little appreciation of what they are doing, and frequently with dangerously misleading results. From a biodiversity conservation perspective, which these authors typically claim to adopt, it is rarely of any relevance at all to compare species richness, even if adjusted for sampling effort. It is rarely of any relevance at all to calculate Shannon or other diversity indices. Yet they do it anyway.

[…]

There is already a plethora of maps which attempt to identify areas of greatest biodiversity value for different purposes and at different scales, such as the Biodiversity Hotspots and Key Biodiversity Areas. They each have their strengths and weaknesses, and as time goes on, iNaturalist will increasingly be one of the datasets used to inform these prioritisations, but not iNaturalist data in isolation.

Excellent @deboas I read people here getting into a tangent repeating that iNaturalist data could not be meaningfully used for biodiversity; and that methods of sampling needed to be standardized. And here you are, saying that increasingly, iNaturalist will be one of the datasets used to inform famous biodiversity maps, and that many of the (perhaps older?) supposed holy grail metrics such as the Shannon index are of little to no use (from a - I quote - “biodiversity conservation perspective”), even when taking into account sampling.

Just what we needed to get diversity of thought!

By the way, I read in the video Webinar: How Your iNaturalist Data Makes a Difference for Biodiversity (12m36s) published by iNaturalist itself 1 year ago, that almost 25 percent of the (GBIF) papers using iNaturalist data are of the type “Conservation & Biodiversity Management”, thus ranking second just after “Climate Change Impacts” (well above 30 percent).

  1. I would like to ask you 2 questions:What is exactly your critique on the traditional indices in general, such as Shannon’s. You say the results can be dangerous, misleading and that they are rarely of any relevance. But I am curious why exactly. Or is that only from a - I quote - “biodiversity conservation perspective”, and are they valid for which other perspectives?
  2. If you say that iNaturalist data will be increasingly used for biodiversity maps, how is it used already so today for biodiversity (maps), and how will it be in the future you estimate? Using which metric / data analyses will the iNaturalist data be incorporated, and what metrics do you propose could be useful to host on the iNaturalist platform itself:
    → A) Either as a proxy tool for biodiversity awareness (stressing the skews/biases/limitations of our data-set) perhaps giving meaningful insights about the users rather than the ‘real world’ biodiversity (in a similar way as the total observations within a country currently provide meaningful insights about the usage of iNaturalist within a country).
    → B) Or (if possible) even though not conclusively but at least providing some possible indications of approaching or possibility of extrapolating to ‘real world’ biodiversity, taking into account your note about the different measures and scales being valid for different purposes. Bob’s your uncle: what do you think could be useful? And if nothing can be useful, based on iNaturalist data alone, could you point out some maps/tools which do already incorporate it and tools/maps regardless of whether or not they use iNaturalist data which you find most useful for approaching (modeling) ‘real world’ biodiversity such that we could learn from their methods. By the way, we should be mindful that it is in fact possible for iNaturalist to include other data-sets into their analyes, for example about endangered species, and we should not forget that iNaturalist already has richness in its data as it contains for example information such as “Introduced in […location xyz…]: arrived in the region via anthropogenic means”, so looking forward to your proposals equally on how iNaturalist could be enriched for the purpose of biodiversity.

That is not abundance, but rather separate occurrences over time that happens to be encompased by a larger area. As one definition puts it : ”abundance is the relative representation of a species in a particular area, It is usually measured as the number of individuals found per sample.”

Unless the intent of the original observing was meant to record abundance, iNaturalist data cannot be inferred to mean abundance. In your case the sample of the entire land area of the country of Brazil is a terrible sample area and not very useful “Abundance is in simplest terms usually measured by identifying and counting every individual of every species in a given sector”

To record abundance with iNat you would have to specify the area in which you sampled and upload a separate observation for every single individual you saw, which(most) people are definitely not doing.

Small exception to that statement

(except some birders who are new to iNaturalist and get told off for the behavior of a separate observation for every bird in the flock, despite it not being explicitly against any iNat guidelines, see forum for further discussion on this)

1 Like

A very crude measure of biodiversity I use privately to decide where to go is looking at species divided by number of observations or number of observers (or on eBird, species divided by number of lists). This generally helps account for what’s probably the #1 factor in a place having more species, which is more observations. A city park in NYC with 1000 species recorded and 10,000 observers prooooobably has less biodiversity than somewhere bumfuck nowhere with 100 species and 10 observers.

I do in fact do this if I feel like it and then when people ask me to combine the observations, I inform them each one is a separate individual

1 Like

I have to admire your tenacity in the face of all these people telling you your proposal won’t work.

The number of observations on iNaturalist cannot be used as a surrogate measure of the species abundance in the environment, for too many reasons to list here. I advise you to drop the whole idea of the Shannon Diversity Index for identifying areas of high biodiversity, because that is not what it is for. The index is to tell you whether a community is dominated by a few species or whether species abundance is more even. In other words, will the next specimen you pick up probably be the same species as the previous specimen.

If you want to put iNaturalist data through the Shannon formula, there is no harm in that, and if you think it tells you something interesting, I will happily read about it in the forum. But whatever that exercise comes up with won’t have much relevance to the outside world.

2 Likes