Hi folks,
I’m a GIS grad student at the point where I need to commit to a thesis project. I want to do something related to iNat data. I’ve done a fair bit of background research on data assessment and use cases for iNat data. I’m thinking Chrysolina bankii would make a good subject of study. This beetle has been introduced to California fairly recently, and very little has been published about it. It seems to be establishing itself in the Bay Area (nearly 4000 observations). I first became aware of it via iNat a few years ago.
My research question would be something like “Can iNaturalist data be used to assess the spread and invasion potential of Chrysolina bankii?”
Any thoughts on the viability/usefulness of such a project?
Thanks! :)
I’ve been curious to see maps of the spread of Spotted Lanternfly using iNat data. I vaguely think someone must have made those by now. If such a study is available, it could be a good parallel example for your study.
Regarding your viability/usefulness question, I think it depends tremendously on the exact details of your question. Defining a really sharp question is often the hardest and most important part of many graduate theses.
is a good place to start exploring. The answer is going to be, “yes, to an extent.” But hopefully that data exploration will bring you to questions that have broader implications, more precision, and more enlightening answers. In addition to making some initial maps, a good way to start is to ask yourself, “what do I really want to know/tell people about this?”
@jules_farquhar is doing his thesis on the skink Carlia sexdentata in my local area using iNat data. Might be a good person to chat to find out about the unexpected difficulties.
Many people use iNat data in research projects, but there are also some researchers who oppose using citizen science data. I think you should talk to the professors at your school about your project idea. Ultimately, it will be the people at your school who will decide whether your thesis project is worthy of a degree.
How do you define recently introduced? I searched for Chrysolina bankii in California using a site I’m building, and the first observation was in 2008. Nothing from 2009-2011, then steadily increasing since 2012.
https://inat-explorer.dataexplorers.info/?taxon_id=318965&place_id=14&order=asc&order_by=observed_on&verifiable=true&spam=false&d1=2000-01-01&per_page=24&view=observations_observations&subview=graph&graphs_category=year
I recently created maps and charts about the spread of Spotted Laternfly using iNaturalist data from 2016 to 2026. Here’s the post.
thanks for your feedback. Yes, I need to refine the question. Here’s some more specific ideas:
- is the distribution of Chrysolina bankii changing/expanding significantly over time?
- is Chrysolina bankii a threat to native Lamiaceae species? (it is associated with Lamiaceae)
- are specific habitats associated?
- is it more commonly found in urban/cultivated contexts? (to the extent that determining this is possible; I’m aware of urban spatial bias in iNat data overall)
yes, I am pretty well versed in the literature about bias and data quality issues in citizen science data. That’s where I’ve been focused in my initial literature review. I have spoken with two professors who said they would be on board with a project using iNaturalist data. The question at this point is getting more specific about what to research.
Yes I realize 2008 isn’t that recent, but its somewhat recent. Recent enough that its spread might be able to be shown thru the iNat data. And most importantly I think, its presence in California isn’t well studied. There’s one paper from 2007 which basically just says “we found a new exotic beetle in California in one single place”. And then there’s a 2022 CA dept of Ag document assessing it to be a low importance pest.

I would love to discuss further with anyone who has done this kind of research!
I agree with @dlevitis - the answer to your first question is already known. Coming up with the specific questions and how you will assess them is really the key to how valuable/appropriate the approach will be. I think the overall project and your more specific questions sound like a promising approach. My only, well, not “worry”, but hesitation would be that I think answering those questions might not be “enough” for a masters thesis (at least at the places I’ve worked). For instance, what types of analysis does the current amount of data allow you to complete? Just producing some descriptive graphs that answer your questions in some manner would be pretty easy/not necessarily at the level of masters work. Since your data has already been essentially collected for you on iNat (and that won’t be part of your work), professors may expect a bit more in terms of depth and quantity of data analysis. You may also want to think about how you would QA/QC the data (will you check all 4000 records?, very doable btw, etc.).
You could also consider combining an iNat analysis with some type of fieldwork. Can you do some type of fieldwork that addresses potential bias in iNat data? For instance, can you go sample habitats and see if the habitat patterns from iNat hold up in the real world? Or assess whether iNat observers are missing some habitats where the beetle is? Or can you verify/expand host plant associations in your own real world work? This type of work would give insight into both the beetle and the iNat aspect of the data and make for a compelling project (in my opinion).
I would suggest doing some basic exploratory work with the iNat data to help you generate some hypotheses/more specific directions, writing up a 1-3 page proposal, and bouncing ideas off professors/other grad students. Good luck!
During Covid, I (like many others) questioned what we could do from our computer to support the scientific community using open source occurrence data.
Only after seeing that your interest is in GIS did I consider suggesting this, but: species distribution (ecological niche models) are a compelling way to utilize presence-only occurrence records.
Depending on your system or interest, this is no easy task, but it is all accessible and knowable from the perspective of theory and execution.
Namely, if you are interested in honing in on an invasive species, there are compelling ways to use its native range and invasive range to each estimate what stage of invasion they are in. Not to mention what the total potential range of invasion would be under climatic or other conditions. You can determine this using R, the iNaturalist data (which you should access via GBIF so it is citeable and includes other data repositories outside of inaturalist), and open-source climatology predictions/assesments (CHELSA would be my suggestion over BioClim).
Really, there is a lot to this from a theoretical perspective: what assumptions are you willing to make?; how GOOD are the data [aka is this a readily identifiable species; is it a species you expect to have been sampled sufficiently in its native and/or invasive range?]; what open-source environmental layers are you willing to consider as a potential driver of the species range? etc. etc.
My advice to you if you are interested would be to research Jamie Kass’s tutorials on how to build accurate species distribution models/ ENMs using open source data. If you are keen on using SDMs to investigate invasive process, here is my pub where I managed to do that, but I’d more encourage you to investigate the methods of folks I cited in my introduction/methods to understand the process, they executed it artfully: https://doi.org/10.1016/j.japb.2024.02.003
if you get the hang of this style of analysis, there is much more to do with it quantitatively - what environmental variables are important to the model and species?; what spatial or other biases have affected your model?; how much range does your model project under future climate scenarios?; what biotic factors are absent from the model that would adjust your predictions?; what biotic factors can or should be included to make for a more accurate assessment of range that’s exploitable by an invasive?
And if you go down this route, remember the words of George Box: “all models are wrong, but some are useful”
Note that an increase in total number of observations should not necessarily be taken as an indication that the species is becoming more common. It also likely reflects the fact that the number of iNaturalist observers has been steadily increasing since 2012.
Agree, and number of observations is increasing too. Using iNat data for assessing trends is tricky - one can’t test against a hypothetical baseline (i.e., static rate), but needs to determine a relevant baseline given iNat usage patterns themselves and then test against that.
Oh, you young people. 2008 is recent. Before you were born, maybe, but recent enough that the status and range of this beetle have not settled down. (And I was already well over 50 then.)
The issue of increasing # of observations of this beetle being related to increasing # of observations over all will make a nice part of your thesis. I was going to say “a nice little part” but although that part may look short in the finished product, it it will take some significant work.
I think it would be useful to contact museums, entomology departments, and extension service offices about this beetle to compare their data to iNaturalist data, to see if the trend on iNaturalist matches what they report.
When you delve into the complexities of this issue, it will make a good Master’s thesis. After all, the Master’s is supposed to take up 2 - 3 years, not your whole life, and to demonstrate that you have learned how to do research. Could be interesting.