The academic rigour of RG observations

An expert botanist gave me some feedback about a table of RG observations I shared in the public domain:

“What I don’t like about the table is that the source seems to be a crowd-sourced website (inaturalist) which, I would possibly ignore completely as wholly incomplete and equally unreliable unless it could be shown that some academic rigour had gone into the data collection and collation. If that is the only source of data available (and I would not conclude it to be anything approaching a ‘survey’ as such), I would ask myself what value adding such a random list serves to an article.”

In this case, I did all the observations myself, as the solo observer of a project. I am confident that the table contains at least 90% of the macroscopic species observable in the geographic area of the project. The majority of the IDs have been from professional experts (eg, head curator of local herbarium, etc).

However, I’m a little unclear about what I can say with respect to the “academic rigour” of the observations or IDs. For the obs: I simply walked around and around and around, over the course of many days at many times of the year, and photographed any species which looked unique. Would there have been a more scientific way to collect the data?

I’d appreciate your thoughts on how I could respond to this.

Is it the suggestion that RG observations lack academic rigour correct or incorrect?

What is the definition of a “survey”? Is it possible for citizen scientists to conduct a “survey” on using iNaturalist? If one citizen has been through an area, capturing every species they could possibly find over an extended period of time, does this qualify as a “survey”?

Here are links to what I’m referring to:

maybe others would disagree but i’d say just totally ignore that ‘expert’ and move on with the data you have. People like that feel threatened by the existence of iNat sometimes, or refuse to be involved with it for whatever reason, but in the end, the word of a ‘professional botanist’ in a notebook without a voucher, etc, really doesn’t have that much less error rate than reviewable RG observations on inaturalist.

I’d probably ask the random botanist what value THEY were adding to the discussion with their unneeded comments.

Am i a jerk?


In the context of Wikipedia, iNaturalist, a crowd sourced website/user generated content, is not generally considered a Reliable Source (I capitalize this because it has a meaning specific to Wikipedia). I’m on my phone and heading to bed soon, but if you look up sourcing requirements for Wikipedia, that should give some context to their comments there.


‘Survey’ in this particular sub-field has a more specific definition, so no, what you did (and what most iNatters do) are no ecological ‘surveys’ - but this doesn’t make them useless, or meaningless. An important part of science is recognising that all data sources have flaws and strengths. A Survey and iNaturalist have different sets of flaws and strengths, and can be used to answer slightly different questions.

For a true survey, you would overlay an invisible grid over some area of land, usually 1*1 meter but it depends on what you are surveying for, and then methodologically look through each grid square and record whatever you find, no matter if it’s the hundredth of that species you’ve seen in the area or even if it’s the hundredth of the species you’ve seen in the grid. Walking around taking photos of what you deem interesting isn’t a ‘survey.’

but as I said, I don’t think there is anything wrong with data produced by iNaturalist or any reason a study wouldn’t be able to use on it, provided the methods and conclusion were well thought out given the restrictions of iNat data. I myself have used iNat data in a grant application.


So, are you saying that a survey has to be a quantitative + qualitative assessment, as opposed to just a qualitative assessment? In this table, I’m not attempting to report on density/frequency of individuals; rather, I’m simply trying to describe which species can be found in the general area.

Survey is obviously the wrong word to use for the data in this table. Is there a more appropriate word to use?


In that first link you posted, you’ve got extremely helpful comments. Academic peer review often does not strike quite as friendly a tone as that commenter used and can provide less substantial feedback.

I am inclined to agree with the commenter’s judgement that including a list of taxa in a general article on a location is of limited value. I know nothing about that particular location, but I am positive that the list in your article is very far from complete. A more complete list on the other hand would be by far the biggest item on that general article for that location. So having it there may not be the best approach.

I also agree that the key question raised by the commenter is whom and what purpose are you serving with this list, and what you do from here depends on the answer.

If you want to go beyond inaturalist and create something more encyclopedic, then maybe a secondary (expandable) article about the flora and fauna of that location could work – with (expandable) sub-articles for high level taxa. There’s also wikidata.

If you don’t want to spend the effort to do that (I wouldn’t), then including inaturalist as an external source, as suggested by the commenter, is a good idea. Make a collection project for that location, and link to that rather than duplicating a species list that is necessarily incomplete and requires maintenance. (The external link to inaturalist would crowdsource that maintenance.)

Yes you are. (But I’m a jerk too, so.) Your general sentiment is right of course. Just in this case, the commenter gave honest feedback in a respectful tone. That sort of thing should be encouraged. And you tout your land manager creds all the time too, so don’t be so down on that either.


Oops never mind, I hadn’t read the links you provided.

I don’t think iNaturalist is any worse than Wikipedia in terms of data quality, but someone has already linked to their policies.

I think this is the most important part of the comments. There are a fair number of papers out there that collected their data entirely through iNaturalist, but they had rigorous data collection methods. So just because something was done through iNaturalist does not automatically mean it must be useless. iNat can be used as an effective vessel (eg via projects) to collect data exactly like any other scientist would.


Should also be noted that many (if not all) ‘traditional’ methods of data collection have their own flaws and biases.

Agree with a lot of the comments here. I think iNaturalist data can definitely be valid, and the error rate for identifications in certain groups seems to be on par with what can be found in some scientific collections (high accuracy, but not perfect).

I think that the main point here is that lots of data can be “imperfect” but still useful (much like a model); it just depends on how you are using the data. I’m not a plant ecologist/botanist so I don’t think I’m qualified to comment much on @j-k’s situation. But in general, if you have collected the data yourself and had reliable sources IDing, the list is a good starting point to know what is present at a site. You’ve basically done a one person bioblitz and you’ve created a digital field notebook of verifiable observations that is accessible by other scientists to back up your species list. That has value for answering many questions and as a first pass for knowledge of the plant ecology of a site. It can’t answer the same questions as a more rigorous, quantitative survey of the type described by @benjaminlancer though, but this doesn’t mean it isn’t useful.


Ignoring the data quality aspect of your question here…as someone who has done considerable editing on Wikipedia, I absolutely would not use iNaturalist data as a source. The main issue is that iNat observations can be considered “self-published” which does not meet the standards for primary sources on Wikipedia. There is a strong preference for secondary sources on Wikipedia, though primary ones are allowed. So, iNaturalist data that had been incorporated into a secondary source like a book or something else might be ok.


While the data you have here is not the sort of rigorous survey you’d use to designate wetlands, do larger site analyses etc., simple occurrence data for a given site is not useless. What you can actually do with that dataset is limited, sure, and it may not fit wikipedia’s sourcing policies, but it’s still valuable. I’m currently working with an 1877 flora of my home county compiled by a single researcher, in which all of the records were backed up by vouchers that mostly no longer still exist, so the accuracy of their work can no longer be verified. In spite of that, I’m finding populations of now-uncommon plants still extant exactly where the author described them. Simple occurrence data, even from photocopies of 150-year-old handwritten documents, does have value to science. It’s just a question of appropriate application relative to data quality.

I’m not a wikipedia expert- would a sentence in the article like “a citizen science project on iNaturalist has recorded x number of species at this site” be appropriate, with link in the sources? It would still provide a snapshot of the biodiversity the OP seeks to convey while also qualifying the data for readers to interpret appropriately.


Curious. What difference would it make whether the data originated from a book or not?

I deliberately try to not ID my observations beyond kingdom level, as I appreciate getting an independent assessment of my images. The IDs are an interpretation of the original images, and the IDs are not mine.

Furthermore, I’m only listing the RG observations, which means that at least 2 people have provided an independent interpretation of these images.

Wouldn’t it therefore be reasonable to conclude that the IDs I’ve putting on Wikipedia are secondary sources?

To me, the term “survey” is perfectly appropriate for a concerted effort to document the species within a given region. It does not require that you set up standardized transect lines or measure plant densities within a Daubenmire frame. Those are useful to answer other questions. But if your goal was to identify the flora of your study area over multiple seasons through frequent visits and observations, then I call that a survey. I’ve done many such surveys to answer basic questions about the diversity of a particular property and while they might lack scientific rigor in methodology that does not mean they are without value. It all depends on the purpose.


If the data were part of a secondary source, then it is usually expected that everything in that source would have been synthesized and verified to a degree by the author and reviewers. I don’t think having “independent interpretation” on each observation and obtaining research grade helps at all, as the observation, whether verified by other users or not, is still a primary source. Looking through individual (or automatically generated aggregations) observations, synthesizing your finding, and then adding that to Wikipedia would still violate the no original research policy.


I should add that a plant list based on RG photo-records on iNat is a step above a bird survey list that is compiled by an observer or two using binoculars (no camera) to document avian diversity. We trust that the observers got their IDs correct but there’s no way to independently verify. Those bird surveys are done all the time and many are published. Your photo-records can be revisited and revised if necessary.

Not in the Wikipedia sense. Wikipedia’s approach to “reliable sources” isn’t intended as an absolute measure of data quality; it’s a heuristic that can be generalized over the encyclopedia’s enormous scope. In the most general case, Wikipedia editors don’t have the expertise or resources to evaluate whether this sort of user-generated/crowdsourced content is correct, even if we could look at yours in particular and go “hey, this is all right”. (Or to put it another way, the policies on reliable sources and original research are there to prevent huge quantities of editor time being sucked up into bickering with cranks; the cost, which is deemed acceptable, is that when new information arises, the encyclopedia will lag in reflecting it.)

It might be more productive to try getting your list published in a local journal of natural history instead, which would both lodge it in the scientific literature, and make it acceptable for use on Wikipedia.


Maybe “sampling”?

I agree with the commentator mentioning publishing standards. iNaturalist observations do not meet criteria of peer-reviewed paper, this is just how data are treated in academic science and it’s a common standard (to what I’d stick by now not just as a member of academic community, but for a number of other reasons). That does not diminish great importance of resource itself as an invaluable source of information for people who want to know what lives around them here and now.

As for observation quality. I’m not a botanist, but got a comprehensive course of botany at the university and would be careful with identifying plants only by photos (as a zoologist, I’d be equally careful with the identification of animals). Some things like root structure or fine structure of leaves, flowers are not visible on the photos, we were always checking those under the binocular microscope using fresh specimens or herbarium items. I guess, if a specialist was collecting plants for many years in the same region, he or she would correctly recognize the species even without those. From my experience - I’m identifying rotifers as a rotifer taxonomist, - during the years of observations I know with high probability what it could or could not be even if I see a photo not showing all details. But still, there is a space for mistake.


with plants, i find that nearly all species can be distinguished by sight once you get to know them. There are exceptions but they are pretty well known. Photos with a good view of the plant and reasonable view of diagnostic features are totally adequate to identify most plant species, or at least as adequate as anything else. There’s always risk of error, keys are sometimes wrong, key characteristics variable, etc too. And yeah, a photo of say, only the stems of a tree with no leaves or bark, are probably not adequate