The City-Nature-Challenge Distortion

The City Nature Challenge (CNC) each April is a tremendous success in that it brings in new observers and new observations. However, there is a data distortion that I’ve been exploring recently. Namely, I was noticing the seasonality charts for species can show a marked spike in observations in April that may or may not be real in terms of activity or abundance of organisms.

Observations of species near La Paz, Bolivia, are a great example here because La Paz is annually near the top of the leaderboard in CNC observations. The examples I use here are lizard examples mostly because April is fall in the southern hemisphere and so observations don’t correspond to the spring increase in activity of lizards, which probably happens in September or October. These species only have a couple hundred observations, which also helps make the distortion more obvious. In both examples below, observations in the La Paz region outside the CNC are fairly low to non-existent.

I also removed the location filter for the second species to show what its overall seasonality plot looks like. Similar distortion, but with more observations outside the timing of the CNC. If it weren’t for the CNC distortion, the seasonality peak would likely occur during local spring season (i.e. October).

These are extreme examples to make the point that if one has a large urban area or large population of observers contributing CNC observations, there is likely some seasonal distortion to those local data that one should be aware of. Going forward, when considering seasonality plots, I will probably be filtering out any data from the CNC dates in order to minimize this data distortion.

8 Likes

Seasonality doesn’t mean anything when there is always observation bias involved. Assessments where observers put the same amount of effort every time of year are more accurate than iNaturalist user contributions.

Thank you for raising your concerns about how the label “Seasonality” might affect how people perceive the chart. It should probably be renamed to “Observation Frequency” to avoid misleading people.

4 Likes

CNC should be less prone to ‘manipulation’ next year.

Now we are in the Great Southern Bioblitz - but that generates less of a seasonality blip than CNC.

2 Likes

For any species with a low number of total observations, I would consider the seasonality graph to be suspect as it is easily biased by any concentrated observation effort such as the CNC or some other focused activity.

2 Likes

Here’s the info iNat gives about charts when you click on the question mark next to the chart:

About Charts

Seasonality

This chart shows the number of observations of this taxon grouped by month. Keep in mind that these are numbers of observations, so they are influenced both by when the organism can be observed and when people bother to observe them. So a bird might seem to be very active in May, but that could also be due to more people birding in May who tend to ignore that species in later months. Similarly, if you see more dragonflies in June than in January, that’s probably because we have more people observing in the northern hemisphere than in the southern hemisphere and not because dragonflies are more active in June, so check the map when considering these charts. It’s always a good idea to be skeptical of these charts when there are low numbers of observations and/or large discrepancies between the number of “Verifiable” and “Research Grade” observations.

History

This chart shows the number of observations of this taxon by month for the last ten years. Again, it is biased by the number of people observing, but it will show you unusual spikes in observations, and if it seems flat or decreasing despite an increasing number of observers, that might suggest a change in abundance.

Relative Observations

Showing frequency as a relative proportion of all observations helps smooth out the effect of the overall growth of the site. For example, if the site is growing as we get more observations with every passing year, we get more observations of any individual taxon, which doesn’t tell you anything about whether there are more of that taxon around to observe, just that there are more people observing it. Showing the relative proportion means that if there are 100 observations total but 20 observations of this taxon, the proportion is 0.2 (20 / 100). If people observe 2000 observations the next year and 400 observations of this taxon, the proportion is still 0.2 (400 / 2000). This causes some aberrations when there are very few observers in an area, or for taxa that are very infrequently observed, but that’s true of total counts as well.

You can switch between ‘relative observations’ and ‘actual counts’ in the settings next to the chart, but that doesn’t seem to make as much of a difference as I would have expected.

The thing that I always find fun in ‘seasonality’ is trees - I’m pretty sure they exist all year, but if you just take the stats at face value, most of them disappear in winter!

I like the idea of taking out the artificial spike of CNC, but it doesn’t eliminate the issue; people are likely to iNat more in spring because we can finally get out of the house, and there’s probably an increase in new users who continue to iNat just after the CNC and gradually peter out.

Info is skewed in terms of what is observed, too. My city has more observations of deer than squirrels, and more observations of Great Blue Herons than House Sparrows, but I doubt those stats reflect actual populations.

I think the main thing with using iNat data, just like all data, is to be aware of how it’s collected, never take it at face value, and don’t let it lead you to make conclusions that it doesn’t truly support.

3 Likes

Another factor is not just whether people are observing a particular species, but how easy or difficult it is to ID at certain life stages. E.g., many plants are less likely to be ID’d if they do not have flowers or leaves, larvae tend to be more difficult to ID than adults (and also often more difficult to see/find), etc.

So even if people were equally likely to see and observe something at all times of year, in some cases the observation counts at a species level might not reflect this because the observations for some portion of individuals are ID’d at a broader level.

6 Likes