Aren’t planted trees “not wild” by definition? Maybe I’m misreading your comment, but “not wild” organisms should be in casual shouldn’t they?
What does ‘duplicate’ mean in this case? The exact same image(s) of an organism? Do people do that? That would be irritating. But if it’s just multiple observations of the same organism at the same location, that seems unavoidable on a platform like this.
I agree in principle with this request. But if the info is presented in a pop-up on iNaturalist it will miss researchers who download iNat data via GBIF. That’s what I’ve been doing, and I think is the method recommended by iNaturalist.
GBIF has the advantage of providing a clearinghouse for many different data sources. But that comes with the challenge of each source choosing which fields to use for their records, and how they interpret those fields. With that in mind, it would be useful to have a readily-discoverable list of the fields iNaturalist uses, and how they are interpreted.
Most of them are self-evident, but things that might trip up people not familiar with iNat are “information withheld” (i.e., you need to know it exists to use it as a filter), event time, latitude and longitude (which may be altered if the observation has been obscured). Also, the ‘official’ iNaturalist dataset is only uniquely identified by datasetKey == “50c9509d-22c7-4a22-a47d-8c48425ef4a7”, the name ‘iNaturalist’ is used for multiple datasets, as discussed on the GBIF forum.
No you’re correct, they are not wild and Casual. But that doesn’t mean they’re not useful data.
Users sometimes accidentally upload an image multiple times, or they upload several different images of the same individual as a separate observation because they don’t know how to put multiple images in a single observation or they don’t know that they should do so. These are often new users who may not see comments, so it can be challenging to successfully communicate with them and explain to them how to edit.
There are open requests for a way to flag such cases or tools to help users combine observations, but I suspect these are not simple to implement, so the policy at present – such as it is – is to treat them like any other observation. Which I understand annoys IDers and I’d like a better solution, too, but I don’t think it is the big problem I have seen some people treat it as.
ha ha ha ha!
To start with, iNat should have a sort of canonical list of recommended fields (as in, if you need a field to record information X, please use field Y, rather than creating a new field for that purpose). Instead, projects create new fields that mirror existing fields. If you’re trying to harvest data, you might have to download multiple fields that could (potentially) contain the information you’re after, and you’ll never know if somebody creates a new field that could contain the same information. It’s a moving target.
There are a number of scenarios that result in what I regard as duplicate observations, but I don’t expect others to agree with me on definitions. I have my own process for detecting and weeding out these duplicates.
One scenario that is particularly irritating is the group who travels together on a regular basis with each person submitting their observations separately (sometimes on multiple platforms), without mentioning their companions in their observations. Each person has their own idiosyncratic method of entering their observations, with the result that the place names and even lat/long frequently diverge (sometimes by a huge margin).
I’ve also seen folks who clearly know how to group multiple photos under a single observation submit multiple photos of a single organism as separate observations. My guess is that they got lazy and couldn’t be bothered to consolidate the photos.
One of the most pernicious scenarios I’ve seen is a popular project that decided they wanted observers to record how butterflies interact with various plants. Instead of using existing fields for this, they created new ones, which get added automatically to the observations of project participants. They encourage observers to document all interactions. So what do participants do? They follow individual butterflies around, taking photos of them perched on different plants (and other substrates), regardless of whether or not there is any significant interaction between the butterfly and the plant. These all end up as separate observations. I exclude this project from my “identify” screen.
But even folks who “follow the rules” can be problematic. I’ve seen folks who regularly go to a field and (apparently) try to photograph every individual butterfly they find there, even if they’re all clearly the same, common species. Frequently, the same individual appears in multiple observations (cause who can keep track?). There will be dozens of very similar observations posted from the same place on the same date. Then they return a day or two later and do the same thing again. Even if I have software that can consolidate these duplicate observations (and I do), there’s a finite overhead involved in identifying the individual observations (adding comments/annotations as needed), downloading them, and then confirming the consolidation. It would be far better to filter these out up front, but short of blacklisting the observer, that can be difficult. As I’ve mentioned in other posts (in various threads), it would help to have better tools for tracking/organizing the observations we identify.
I think one of the problems causing the proliferation of Observation Fields is that only the person who originally created them can edit them and unlike with projects, there doesn’t seem to be any way to transfer ownership. This probably isn’t an issue for the taxon-based fields, but I imagine it’s an issue for most of the other formats. And if someone who created an Observation Field for a project is no longer managing the OF, it seems like the only way to be able to actively edit the OF again would be to make a new nearly identical one and copy all the information over. I’ve been thinking about this ever since a student of mine helped me set up a Traditional Project with several new OFs. When she graduated, it was easy for her to turn over project ownership to me, but there was no way hand over the OFs’ ownership. Fortunately the former student still responds to my emails and is willing to make small edits.
This seems like a problem the staff is going to have to solve eventually, otherwise all OFs will inevitably become unowned and uneditable as users leave the platform/retire/die.
Thanks for that perspective. I hadn’t thought of that. All good points.
What are duplicates? For iNaturalist purpose, a duplicate is an exact copy of a photo (same observer, same organism, same place, same time) posted twice or or more. Yes, they happen, usually by accident. I’ve done it occasionally – and removed one copy when told about it.
The following are not duplicates: 20 photos of one tree at one place and time by 20 different observers (annoying though these can be to us identifiers). 20 photos of one species of butterfly by one observer in one small area in one short period of time, as long as they’re all different individual butterflies.
Not quite duplicates but wrong: Multiple photos of various parts of one individual, taken by one observer at one place & time, posted as separate observations. (These should be combined. Tell the observer about this.)
Marginal and I think acceptable, though I can see why some data users would exclude them: Multiple photos of one individual butterfly interacting with different plants, in one place at one time, taken by one observer. We’ve repeatedly said, “Don’t post one individual organism more than once a day unless something changes.” We’re probably thinking things like an egg or pupa hatching, but the butterfly moving from one nectar plant to another is a change. I prefer to post the one butterfly on a series of nectar plants, but for the purpose of the study those students are doing, posting each interaction is useful. So I think it’s allowable, but it would annoy me.
A couple more things.
Here’s another way to make a huge accuracy circle by error: Write in “80” meters, then rethink and put the courser in front of the 8 to replace that and type in “100”. If you fail to remove the “80” you suddenly have “10080” meters. No one would ever do that, you think? I’ve done it.
Fields – Yes, the way iNaturalist manages fields is poor. It needs to change. I would guess that’s about 275 on the list of iNaturalist priorities, but it should change.
I once found a link to a list of every observation field and what it was for and…….there’s a scat-ton.
The lowest hanging fruit would be for me/us to make a help page that lists things to keep in mind when using iNaturalist data. That wouldn’t require any coding work. Could also link to a forum Wiki that woudl allow people to add more niche concerns or advice.
I don’t think it’s possible to cover every known issue, but I think it could cover some basics, common issues, and reinforce the messages that data users should double-check their data and that they can go onto iNaturalist itself and fix some problems as well - that they have some agency.
In chronological order by date added, a lovely window into iNaturalist hx.
That could be a good start. However, most of the people who need it won’t find it. Something on download is necessary, too, I think.
I do not think that anyone in that thread argued about uselessness of 100s of meters. It was all about 100s of kilometers. I can see some potential usefulness of even those, but most researchers will want to filter out those.
They may not be duplicates according to your (or iNat’s) definition, but they are according to mine.
Are you suggesting that I am not allowed to ignore them? Are you saying I’m not allowed to suggest that we should have tools that will help us to sort and track the observations we choose to interact with?
As I said, if I have to bend over backwards and use all kinds of hacky workarounds in order to manage my identification queue, I may simply cut my losses and do all my identification/annotation work outside of iNaturalist. In effect, this is what you are suggesting when you say that it’s up to researchers to vet the data they obtain from iNat. I kinda thought it might be nice if the observations in the database I manage jive with the source observations in iNat, but it’s not the end of the world if they don’t.
Sure. And people can (and do) make similar typing errors when entering lat/long coordinates. (I spend a lot of time fixing those kinds of errors in historical observation data)
One of the big problems with iNat data is that various fields are generated automatically. As a result, the value in one field (often) can’t be used to confirm the value in another. In my experience, most observers do not enter the location/place name - they just use whatever iNat (or the app) generates based on the lat/long. Therefore, one cannot use the place name to double check that the lat/long is correct. (which is what I do with historical observations).
In cases where I have pointed out to an observer that the place name in an observation is completely out to lunch in relation to the lat/long, I have sometimes received indignant responses along the lines of “I entered the correct lat/long - if iNat generated a bogus place name based on those coordinates, it’s not my problem”.
When we’re talking about 10’s of thousands of observations per year, one cannot investigate every observation and ponder/query what the observer might have intended to enter. I already do too much of that. At some point, some of us must take the data at face value and handle it accordingly.
If you want to look at observations with huge accuracy/uncertainty values on the lat/long - go for it. I will continue to ignore them (since iNat does give me tools to facilitate that filtering).
Thanks! I use a bunch of fields in my work. They are all fields that already existed. I didn’t have to create any. I found them by a sort of trial/error process. Having a list that I can search will be very handy if I find myself thinking I need any new ones in the future.
Maybe the process of creating new fields should include a prompt for users to look at a list like this to see if there might be an existing field that fulfils their needs.
Literally the first sentence of the first post in the thread reads:
“For plants: If the accuracy is > 500 meters, even better >100 meters remove Research Grade.”
I believe that suggested upper limit of 100m was repeated several times later in the thread. It may be true that those were all old posts closer to the top of the thread - from before the thread was resurrected.
I don’t think anyone has been suggesting this. I understand your problem, and perhaps it would make sense to make a feature request describing what tools you would need in order to be able to do this more effectively.
At the same time, that does not mean this is not also the case:
Because this is a different issue. Having tools on iNat’s end to that would allow you to vet the data prior to downloading it would not eliminate the necessity to vet the data at all – it would just change what parts of the vetting are happening where.
It sometimes happens that people criticize iNat because they assume that the data set is meant to be usable without any vetting, or they use it and draw wrong conclusions because they have not understood the particular biases and limitations of the sort of data represented by iNat observations. It may seem self-evident to you that you need to check the data before using it, but this doesn’t always happen.
It is also necessary to recognize that different users will have different needs and it is important that the process of vetting the data for one’s own purposes does not negatively affect the status on iNat of data that might be relevant for other users. E.g., records that count as duplicates for you may not count as duplicates for everyone else. It sometimes happens that users will try to make records that are not relevant for them casual without considering that these records may be relevant for other people, or they may think they are doing others a favor by enforcing criteria that iNat has chosen to define differently (e.g., whether field sketches count as evidence, whether escapees are “wild” etc.)
The feature request is about finding ways to make data users more aware of the biases and potential problems with iNat data and what sort of vetting they may need to do (and secondarily, to perhaps reduce the amount of drama and DQA wars around certain types of observations). It isn’t meant to provide solutions for carrying out that vetting.