Estimating species populations from number of users

arachnojoe · February 7, 2022, 3:14am

Hello iNaturalist folks!

I’m finishing up a CS degree and am looking at doing my capstone in predicting local tick populations as a function of weather and species. I don’t want to estimate the absolute number of ticks so much as the relative numbers of ticks across time and place.

Unfortunately, the relative numbers of iNaturalist observations of ticks does not give me a sense the relative numbers of ticks, because the number of iNaturalist observations is also a function of the number of iNaturalist users. For example, 10 observations of ticks in an area with 1000 iNaturalist users suggests fewer ticks than 5 observations of ticks in an area with 10 users.

So in order for me to use iNaturalist data, it seems that I need to estimate the number of iNaturalist users for each area in my sampling granularity. Ideally, I’d restrict this to the number iNaturalist users who sometimes post images of arthropods, in order to rule out those who are exclusively plant or bird photographers – just to improve the accuracy of the data by a bit.

Does the iNaturalist API provide a way for me to do this? I’m restricting the program to United States occurrences, but I might like to add Canada at some point.

The best I could figure was to acquire all locations in the United States, whittle them down to a representative set of sampling locations, and then use the user stats API to query the number of users in each sampling location. But this seems to have two problems: (1) there may be too many locations and too many users to reasonably do this, and (2) I don’t know whether a user location is the location where the user claims to reside or a location where the user has reported observations (I’d prefer the latter).

Does anyone have any suggestions for me? It seems to me that this might be a common problem with using iNaturalist data for estimating populations. Thanks for the help!

~joe

fffffffff · February 7, 2022, 3:30am

iNat data is not designed to be used for population estimations, it’s more complicated than just number of users, one person will look after ticks, another will not phottograph them at all, others are in between, so unless there’s an actual study going on on iNat you won’t get results close to reality.

dlevitis · February 7, 2022, 3:53am

https://www.inaturalist.org/people/sweilab is a research lab focusing on vector borne diseases including tick born diseases.

I know they have thought about questions related to relative abundance of ticks informed by iNat observations. Maybe try contacting them? https://www.sweilab.com/

pisum · February 7, 2022, 4:10am

i think this is only partly true. some users record way more observations than others, some users record more tiny things than others, some users record more arthropods than others, etc.

i would think you would get a better relative count of ticks by comparing relative numbers of ticks against all observations at a given time and place.

you could do this comparison at a county/parish level, since iNat’s “standard” places go to that level. (they also loaded town-level places, but only in certain states in the US.) GET /observations (observation counts) and GET /observations/observers (observer counts) could both be filtered using place.

an unusual alternative could be to use UTFGrids (GET /grid/{zoom}/{x}/{y}.grid.json) to get observation counts within an approximate grid. the downsides with this approach are that grid is not totally uniform in coverage, and it would be possible to get observation counts only (not user counts). here’s an example of that UTFGrid approach: https://forum.inaturalist.org/t/looking-for-inaturalist-observation-map-visualisation-suggestions/7322/22. (EDIT: i’m thinking about this more, and rather than UTFGrids, it might be better to get the actual coordinates from the iNat export, the GBIF export, or the AWS Open Data set, depending on what you’d like to do with the data. then aggregate / cluster the data yourself.)

generally the location of an observation should be where the organism was observed. however, this could be especially tricky for a subject like ticks because there could be a lot of cases where, say, someone observed the tick back at home after hiking all day at a large park. i don’t know how you resolve the difference in this kind of data, except to assume that the location will generally represent the original source/home of the tick, not the home of the observer. there are also cases where the coordinates might be obscured or have large positional error – so you may or may not want to deal with that.

that said, since this is for an undergrad (i assume?) CS (=computer science?) degree, not a biology or ecology degree, i don’t know if these kinds of considerations really matter. (i would think demonstrating your ability to retrieve, transform, and visualize data is probably more important than getting all the statistics and science exactly right.)

you might also try other sources like GBIF, which aggregates data from multiple sources, including iNaturalist. that might give you more data to work with in your sample set, and you might find some sources that have superior data for this purpose there.

arachnojoe · February 7, 2022, 4:44am

Thank you for the contact info!

arachnojoe · February 7, 2022, 4:51am

That’s a fantastic suggestion! I’ll experiment with taking this approach.

It is for an undergrad degree in computer science, but if the program does end up accurately predicting correlations, I would like to make it something of value to the general public. There are far simpler projects I could pick if the goal were only to finish the degree!

I won’t need precise locations. I’ll be restricting my granularity to something like 20-mile diameter regions. I only need the regions to be roughly uniform in weather.

GBIF is unhelpful in this case. Too little data and no way to determine whether the specimens are the result of concerted efforts to collect or random sampling. The former would ruin any sense of abundance that I generate. Ticks are pretty much always around!

jhbratton · February 7, 2022, 9:08am

I guess there are medical statistics on prevalence of Lyme Disease. When you have done your analysis of iNaturalist records, it could be useful to compare your results with the Lyme Disease data. A good correlation would support your study. A poor correlation might mean yours didn’t work or it might have other explanations.

earthknight · February 7, 2022, 9:42am

There have been a bunch of discussions about using iNat to estimate species abundance. Some independent discussions and others nested within other discussions of iNat asa research platform.

Speaking as someone working in ecological research and biodiversity conservation, I don’t think it’s at all a good idea. The way iNat collects data isn’t nearly systematic enough to get any level of abundance accuracy.

It’s pretty good for presence/absence and for tracking changes in range, although even for that it’s problematic as there is a massive bias in the type of organisms recorded and the frequency at which certain species are recorded as well.

If you’re going to try to use iNat that way organizing a species specific bioblitz or ongoing project with dedicated participants who collect according to strict criteria would probably be the way to go. Existing observations just won’t, in my opinion, yield reliable enough results to use for analysis.

kentross · February 7, 2022, 1:37pm

Cool idea Joe! What software are you using? Whatever you’re using you need to convert point data to raster data. If you’re using R, this might be useful: https://rdrr.io/cran/raster/man/rasterize.html.

It may also be useful to look at the absolute number of tick observations (or some transformation of it) then control for the number of arachnid observations in the same grid since some areas have fewer observations.

arachnojoe · February 7, 2022, 2:16pm

It does seem that any way I choose to correlate number of observations with abundance could be based on false assumptions.

Another possible approach is simply to assume that the proportion of iNaturalist users to population size is everywhere the same and compare number of iNaturalist observations to the number of people living in the region.

Another source of error are the IDs. I went 18 months spending 20-30 minutes a day trying to clean up spider IDs but found that my efforts were often futile because there were already too many wrong confirming IDs, and many people aren’t good at revisiting an ID after someone posts one in conflict. Non-experts are too often confirming prior incorrect IDs. So I would still need to decide on a way to vet iNaturalist observations for correct IDs.

I may have found another dataset I can use for this task, one that isn’t dependent on iNaturalist and has expert IDs, and am in discussions for permission to use it.

Thank you everyone for your help!

cthawley · February 7, 2022, 3:23pm

Yes, I agree that trying to use iNat data to estimate abundances is pretty problematic. For one thing, there is definitely strong variation in the numbers of iNat users/observations/general population. Some areas have quite high use of iNat, some almost nil.

One thing you could consider is looking at is relative abundances in the same locations. A lot of these biases would be much less severe when controlling for the location level. So for instance, you could compare the proportion of tick observations of all observations in a given area as a function of weather. In this way, location, the users generally active there, etc. are somewhat controlled for. It’s not a perfect solution, but probably good enough to draw some rough conclusions. It won’t really tell you anything about tick numbers per se, what it will tell you about is observability: the likelihood that a given observer will make an observation of your focal taxon - how you interpret that is open to discussion!

As someone who has used some iNat data in the past, one thing to be on the lookout for is class or other projects where you may see a huge one-off spike in observations of something as 30 people all observe the same individual organism. So some data-cleaning will be needed. This of course, is also an issue with other biodiversity data (where most samples are collected in a targeted, non-random manner).

Good luck!

matthias55 · February 7, 2022, 3:43pm

a possibly relevant discussion/paper
https://forum.inaturalist.org/t/widespread-declines-in-butterfly-populations-linked-to-climate-new-study-by-forister-in-science/21040

stephen814 · February 8, 2022, 6:34am

I am well out of area for your research (near Sydney, Australia). The reason I am responding is to suggest you may need to get out in the field and sample count. I have not done it but gather the approach is to haul a large sheet of white cloth then, after covering a certain area, count the ticks. If you can equate that with the population of host animals you may be able to use host animal populations as a reasonable guide to numbers. I have a theory, in this part of the world, bushfires kill ticks, reducing the number until the host species spread them again. The longer between fires, the bigger would be the tick count. We haven’t had a fire for 19 years. I removed 8 ticks from my ankles two days ago Fortunately, they were larval stage so will not contribute to my barbecue stopper allergic reaction - MMA (mammal meal allergy).

Possibly useful to you, larval ticks usually appear here in Autumn. That is a way off yet. I suspect wet year brought on earlier hatching.

arachnojoe · February 11, 2022, 4:21am

I was able to acquire 65,000 records of expert-identified occurrences of ticks found on random people across the country over several years. That should do the trick. I can spare myself from having to figure out how to make iNaturalist data work.

Thank you everyone for your help!

cthawley · February 11, 2022, 3:42pm

Sounds like a great dataset (though I would doubt it is a truly random subset of the population). That said, if you ever make any of your results public in some form, please post a link here so we can see what you found!

system · April 12, 2022, 3:42pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Number of iNaturalist observations gridded data? General	4	693	November 22, 2020
iNat user and observation stats General	6	1875	June 17, 2019
430K or 454K species? General question	2	348	February 15, 2024
How do folks indicate numbers of organisms or areas they occupy? General question	9	574	September 5, 2022
Using Excel + API Tutorials	7	3753	October 6, 2024

Estimating species populations from number of users

Related topics