How to analyze iNaturalist data using R programing language

wy_bio · December 19, 2024, 4:41pm

Hi,

I’m Wai-Yin Kwan, a Museum Associate in Community Science with the Natural History Museum of Los Angeles County. I’m working with NHMLAC community science team to develop a workshop to teach community scientists how to do basic analysis of iNaturalist data using R programming language. We will teach people how to download iNaturalist data, filter data, create charts, and create maps. We are specifically looking at City Nature Challenge data for Los Angeles County, but the concepts can be applied to any iNaturalist dataset.

The lesson is based on the Carpentries coding lessons. Would anyone be willing to give feedback about the tutorial? Here’s a draft of tutorial. https://wykhuh.github.io/cnc-nhmlac-workshop/

gabrif · December 19, 2024, 4:59pm

Hello, welcome here. Who is your target, in terms of programming experience/background? The approach to R can be quite daunting, at first, for people coming from spreadsheets…

wy_bio · December 19, 2024, 5:09pm

The tutorial is aimed at people with no coding experience. I’m a software developer who has taught beginners how to code. I also looked at Carpentries coding lessons aimed at teaching people with no coding experience.

eyekosaeder · December 19, 2024, 5:29pm

I have briefly clicked through, and I will check it out more thoroughly after Christmas. I will get back to you then. I just wanted to say that I absolutely love the idea and - having suffered through a university statistics and R course - I really appreciate there being a dedicated tutorial for using iNat in conjunction with RStudio, something which I might actually do frequently in the future. Having to piece together all the information oneself from different sources is always the worst step, so it’s great that all the info exists in one place now! :D

optilete · December 19, 2024, 6:19pm

I should hyperlink to the wikipedia articles for the terms iNaturalist and City Nature Challenge and R on the home page. https://wykhuh.github.io/cnc-nhmlac-workshop/ Your audience already knows what this terms are?

wy_bio · December 19, 2024, 6:40pm

The people who attend the workshop are participants in iNaturalist or City Nature Challenge. Our goal is to give people who collect iNaturalist data the chance to do basic analysis of the data.

pisum · December 19, 2024, 9:13pm

if the primary purpose is to teach basic R, then it seems like the tutorial is fine. it covers the many of the basics of working with CSV data, filtering, aggregating, and visualizing (line graph, bar chart, map). my main criticism here would be that some of the examples don’t seem like they are very practical in the real world. for example, in your lesson on filtering using the OR operator, you show people how to filter for observations where user_login == xxx or common_name == yyy, and that doesn’t seem like something anyone would actually ever do.

…

however, if the primary purpose is to show folks how to work with iNat data (as opposed to teaching basic R), it seems to me like most things in the tutorial covers could be done more efficiently in a different way. in particular, downloading a ton of observations just to aggregate the data or create a basic visualization often is not necessary. especially around CNC time, trying to export observations in the standard interface often takes a lot of time. so there are faster / better ways to get a quick count or bar chart or list of top users, etc.

just for example:

something like this is probably the best way to do a quick graph or chart of observation counts over time in R: https://forum.inaturalist.org/t/1m-observations-this-month/2582/4
if you just want to quickly visualize distributions on a map in R, you could do something like this: https://forum.inaturalist.org/t/api-project-heatmap-using-r-and-quarto-html/57793/5

twainwright · December 20, 2024, 5:20am

Hi,
On a quick read-through, your tutorial is a good start for basic R analysis with some examples based on pre-downloaded iNaturalist data. Like @pisum, I was surprised that you were focusing on a pre-loaded cvs data set, rather than using the much more powerful iNaturalist API, but your choice is probably best for beginners to both R and iNaturalist data - using an API is a tough learning curve in itself.

Your “higher taxonomy” section needs a lot of work. Your first example (searching for oaks) is a good example of a misdirected attempt to search on common names. However, it would be good to explain why searching for the word “oaks” fails: because you are searching for an exact match (“==”) to the plural “oaks” I would guess you only get the observations that are identified as the genus Quercus with no species-level ID. If instead of exact matching, you used a string match function that looked for any occurrence of the word “oak” in the common name, you would get all the species level oak observations, but would also get other things with “oak” in the name, such as poison-oak or oak-leaf hydrangea. As you discovered, to get all oak trees and shrubs, you need to search for the genus scientific name.

Your second example of searching for “trees” is, unfortunately, wrong. There is in fact no way to search for just trees in iNaturalist – the database is structured by taxonomy, and “tree” is not a taxonomic concept, but rather a growth form that occurs across many different plant taxa. When you searched iNaturalist taxon page for “trees” and it returned phylum Tracheophyta, that includes all vascular plants (plants that have xylem and phloem): trees yes, but also shrubs, ferns, forbs, grasses, sedges, etc. So no, there are not 95,000+ tree observations in your data set.

btree · December 20, 2024, 6:53am

Just want to throw in this open source book from hadley: https://r4ds.hadley.nz/

In my opinion the best starter book there is out there.

wy_bio · December 20, 2024, 3:04pm

The Prerequisites section of R for Data Science recommends Hands-On Programming with R for people who have never programmed before. https://rstudio-education.github.io/hopr/

I’ve read both, and agree that Hands-On Programming with R is better intro for people new to programming.

_Mikey · December 20, 2024, 7:16pm

The moment I saw the R on the title I think I had a mild panic attack

wy_bio · December 21, 2024, 5:44pm

Why does R give you a mild panic attack? I’ve taught people new to programming how to code, so I know that learning to code can be a positive experience.

_Mikey · December 22, 2024, 4:09pm

Oh mind you, I think what you’ve done is wonderful and extremely relevant to the field. Good work. I’ll definitely use it in the future as a need a good brushing up on.

But ha, story time

One of our first year prerequisites in uni was to do statistics using R studio. Long story short the subject was a mess, it was online because of covid and no one (according to the rumours only two people) passed. This was the twelve-thirteen weeks of trauma, since I was a fledgling student and was scared of not doing well.

The subject got replaced with new lecturers but it was still with R studio. Luckily these lecturers actually taught us and gave us good guides on how to code for the program. Pass with flying colours. (Couldn’t remember how to do any of the coding now). Since then though that subject also got replaced and they now use a different and easier program for statistics, which I wished I did.

pisum · December 22, 2024, 5:16pm

my issue wasn’t exactly that there was no introduction to the API. therer’s definitely a need to learn how to do things with CSV files. but a lot of the examples selected are examples of things that could have been accomplished better / more efficiently in other ways – filtering after download rather than during download, getting observation data and then aggregating rather than getting aggregated data, etc.

while filtering and aggregating in R are definitely skills that are worthy of learning, my fear is that some of the examples in the tutorial give the impression that they are best way to do this sort of thing, and for many of the example cases, they are not. so i think more careful selection of example cases can help clarify this. for example, i would say that filtering by AND between different parameters (ex. param_1 == xxx AND param_2 == yyy) or filtering by OR on the same parameter but different parameter values (ex. param_1 == xxx OR param_1 = yyy) is best done when requesting data from the iNat server; whereas filtering by OR between different parameters (ex. param_1 == xxx OR param_2 = yyy) or filtering by AND on the same parameter (ex. param_1 == xxx AND param_1 == yyy) is the sort of thing that in most cases cannot be accomplished when requesting data from the server (and so must be done in R in this case).

kimssight · December 23, 2024, 3:14pm

I have seen this advertised as R coding. Now I see this as R programming language. I have some technical and like to do analysis and play with data from iNat. However, I had no idea of what this was or why I would be interested. Am I out of date to not know what R is? Is this something I would be interested in? I have no idea. Am I your target audience? I could not see your tutorial.

Anyway, there may be others like me who would be interested if I knew more about what you are offering to a lay person.

Thank you.

wy_bio · December 23, 2024, 3:46pm

Hi kimssight,

I’ve updated the copy on the home page to give more information about the workshop. The workshop will be in person in L.A., but the lessons, code, and data will available online for any one to use. https://wykhuh.github.io/cnc-nhmlac-workshop/ On the “Introduction to Data Analysis” page, I give an over view of what the tutorial will cover. The workshop is based on data for L.A., but people can adapt the code from the lesson to their own data sets.

moots_htx · December 24, 2024, 3:54pm

This is really exciting. As someone who is familiar with what R is but hasn’t made the time to learn, combining two of my interests in one workshop is a thrill.

Unfortunately, I’m not in LA and won’t be traveling to attend but I’m confident that I might be your target audience. I’m bookmarking to see what I can gain from this using local data. Thank you for doing this.

I also appreciate all of the replies with feedback. Maybe R isn’t the best tool or the examples aren’t watertight, but concept is very appealing [to me]. Those constructive replies also provide me with quite a bite of practical insight for my journey. Thank you as well!

wy_bio · December 24, 2024, 4:50pm

Thanks for sharing your story. It’s really important to have a good teacher when learning a new skill such as coding. I hope to avoid causing panic and trauma for the workshop attendees.

What is the new program that is used for stats?

wy_bio · January 2, 2025, 6:05am

The link changed. https://wykhuh.github.io/after-inaturalist-r/ Here’s information to sign up for the class. https://www.instagram.com/p/DDvFP_Pyy1Q/ Sign up deadline is Jan 7th.

_Mikey · January 2, 2025, 4:02pm

Ah so my two subjects, I used SPSS. When the subject got changed (again and again), a fellow classmate said they used Jamovi. And I think now its with Primer.

Topic		Replies	Views
Looking for "training wheel" ideas to explore R / Python with iNat sample data General python , programming	23	961	March 25, 2024
Analysing iNat data General recommendations	5	1725	November 9, 2019
iNaturalist User Trends with R General	23	2920	July 24, 2020
R scripts to help analyse iNat content General	16	2131	December 10, 2020
What would you like to learn about getting data from the system? General	33	3179	April 6, 2021

How to analyze iNaturalist data using R programing language

Related topics