Data quality of observations from India. Data Quality/ RG/ GBIF/India

hopeland · June 15, 2020, 8:43am

Hi,

There’s been a discussion among Indian users on data quality of observations from India. It was suggested to bring it up here so it can be addressed better by the iNat team. Many comments had come in. Indian users, please chip in your thoughts and comments.

Summary:
Primary issue: It seems that many observations have been misidentified and passed off as RG over the years. This might be in the hundreds(possibly), if not more of the roughly 2,70,000 observations from India.

One issue that seems to add to the case of misidentification is mechanical ‘‘agreeing’’ on observations.

Some questions that came up were on the process of RG to GBIF transfer. How frequently does it happen? What happens to misidentified observations when downgraded/reviewed from RG but have already been pushed to GBIF? Does GBIF get updated based on updated IDs?

It also seems that individuals attempting to curate this data find it overwhelming given the volume and scale of the issue particularly in a cross-taxa context and across India. It seems that this requires an intervention based on a well thought out process.

What can be done about this?

naufalurfi · June 15, 2020, 11:59am

Not from India but Indonesia.
We do have the same problem here, what i did was simply brute-forcing trough all the missidentified observations especially the obvious ones (american/european species).
I looked at the species view from the country’s explore page and look for those american species and correct them

marina_gorbunova · June 15, 2020, 12:26pm

Only thing that can be done is reidentifying of all the mistakes. I believe when you change Id it will affect GBIF data when it will be exported next time.

cmcheatle · June 15, 2020, 12:42pm

This was previously covered on the forum, I will try and find the response and link to it, but a GBIF employee themselves confirmed that if an iNat observation is deleted they remove it there.

If its identity is changed, they receive and apply that change to their record. If the change results in it returning to needs ID status, it will be removed from GBIF. If it is then again further corrected and returned to research grade, it will go back into the GBIF records

EDIT - here is the reply confirming this written by a GBIF staffer https://forum.inaturalist.org/t/observations-of-cultivated-plants-on-gbif/5296/4

lotteryd · June 15, 2020, 2:01pm

In the absence of an India-specific “volunteer workgroup” to clean up the records, here’s a more general page where the more problematic taxa could be listed:

https://forum.inaturalist.org/t/computer-vision-clean-up-wiki/7281

Some places do seem to give rise to “hotspots” of misid’s that need large scale cleanup. Sometimes that’s a result of a “bioblitz” that might have attracted large participation/enthusiasm but with low training. If you know of a region in India like this, if you call attention to it some volunteers even from outside India might be able to provide general assistance by looking specifically at that region (prior example: Penang, Malaysia).

po-po-pro · June 15, 2020, 2:43pm

Primarily, the issue is probably only bigger w.r.t India because of the sheer number of users and the size of the country.

Part of the solution would be getting rid of the ‘leaderboards’ for identifications. If there’s no fancy incentive for IDing, the users who make spurious IDs and mechanically agree on IDs are bound to decrease. A couple of us commented about this recently on the comment thread below where iNat carried a report about reaching a milestone of observations (I forget the number).

The more difficult part of the solution can only be achieved if more experts volunteer and engage with observations on the site, which will probably take time to happen.

Star3 · June 15, 2020, 9:36pm

Leaderboards are handy for finding an expert in a taxon, though.

marina_gorbunova · June 16, 2020, 1:14am

There’s a long thread about leaderboards, it’s mostly the human fault of doing things, if they want to be top they would want to be it even without a leaderboard (as you always can open the tab of identifiers).

hopeland · June 16, 2020, 11:25am

Thank you for your comments. Yes. Species outside India marked in India is a frequent problem too.

hopeland · June 16, 2020, 11:26am

Thanks for the comment, this helps!

hopeland · June 16, 2020, 11:29am

Thank you for the comments. Normally Indian bioblitz tend to have organisers who end up curating the data. But I think the sheer number possibly overwhelms people attempting to curate… or such seems to be the experience.

jonhakim · June 20, 2020, 12:03pm

I have heard that Thailand as well had a problem with some users auto-agreeing with any idea they saw.

I wonder if it could be useful for iNaturalist to include a confidence ranking system for observers when they initially upload their observation.

“Certain”
“Mostly confident”
“Best guess”
“Don’t know”

I’ve used a citizen scientist database that had this system before. Perhaps under this system, observer identifications which were just “Best guess” or “Don’t know” wouldn’t count towards a RG observation. Maybe even not count “Mostly confident” ID’s as Research Grade unless they got to a higher # of agreements, such as 4.

Star3 · June 20, 2020, 1:09pm

I’m intrigued:
How would that work with filters before it gets to RG?
E.g. If I find an observation in unknown or plants, and add a “best guess” ID of legumes, is it still going to advance to legumes so someone filtering for those will see it?

pisum · June 20, 2020, 1:25pm

does anyone know how most of the observations in India are created? is it possible that even more users there than in the rest of the world have a mobile device but no desktop? it’s not easy to do a lot of IDs on a mobile device. so maybe tech access contributes to the problem of lack of identification there?

marina_gorbunova · June 20, 2020, 1:36pm

I believe the initial ids should be considered the same as now, no matter of their “grade”, but if the next one “overbeats” it a community id will change to a new one.

jonhakim · June 20, 2020, 7:34pm

Yes, that’s what I would hope. Have the system work the exact same way it currently does, the only difference being that an observation doesn’t reach “Research Grade” unless the identifiers actually express full confidence in their ID.

DianaStuder · June 20, 2020, 8:53pm

Also an issue around new users wanting to say ‘thank you for your ID’ and the only option they can see is to agree. Good intentions but.

po-po-pro · June 26, 2020, 1:39am

That’s a good point. It could certainly be the case that a lot of users here use their phones when uploading observations. But it isn’t really a Q of phone v desktop (for more accurate IDing) but rather app v website. I use only a phone (have never used iNat on a desktop), but I use the app to upload my photos after doing a lot of background work on the main website to check all the nuances, have a look at the other observations of a species, look at the maps, and finally get the right ID.

So, you’re right; if a lot of people use only the App to do everything there’s a good chance we could be ending up with a lot of poor quality IDs. Unfortunately, people are rather enamoured with apps these days.

pisum · June 26, 2020, 3:10am

the app is definitely lacking a lot of functionality that would aid in efficient identification, AND i personally think the website and other websites are harder to use on a small screen versus a luxuriously large screen. i just tried to ID a wasp (https://www.inaturalist.org/observations/50946340) with brute force, going through the taxon tree and looking at photos genus by genus and species by species for a good match, and as hard as it was to do on a large screen, i think it would have been many times harder to do on a small screen. if i was already an expert and had good keys to use that i was comfortable with, then maybe it doesn’t matter the size of the screen, but if i’m just doing visual comparisons with more than just a few possibilities, i would never attempt to do that on a phone.

the workflow of a lot of the pages on the website also seem optimized for a desktop, not for a mobile device. so although the pages can be used on a phone, i would be willing to bet that the average user could go through, say, 20 identifications on the Identify screen much faster on a desktop than on a phone.

po-po-pro · June 27, 2020, 2:34pm

To each their own, then! I personally find the phone rather good, because it enables me to start working from whatever place I want, even while travelling or while doing something else.

One other functionality I find good with the phone screens is the ease of zooming in an out at will (for pictures this helps massively), and the use of touch rather than mouse clicks to do most functions.

IMO, I find that the iNaturalist website is excellent and works equally well on the computer and on the phone, to the extent that I’ve not found any feature particularly lacking when I use it on the phone. I agree that some other websites are not designed for optimal use on a phone.

But you’re right; for someone who’s used to working on a device for long it becomes hard to shift to a new one without an uncomfortable transition period.

Topic		Replies	Views
Quality control checks? General	35	2024	November 12, 2020
Overzealous Identification General question	114	14646	September 6, 2019
False "research grade" observations General	37	4148	November 1, 2020
Real and fake observations General question	18	2395	December 30, 2021
How best to approach large scale identification issues General	26	1522	June 13, 2021

Data quality of observations from India. Data Quality/ RG/ GBIF/India

Related topics