Several people on the forum have sought advice on how to upload CSV files and been directed to the Import page here: https://www.inaturalist.org/observations/import#csv_import. But it hasn’t prevented others (including me) seeking further advice.
Taxon name |
Date observed |
Description |
Place name |
Latitude / y coord / northing |
Longitude / x coord / easting |
Tags |
Geoprivacy |
text |
YYYY-MM-DD HH:MM |
text |
text |
dd.dddd |
dd.dddd |
tag,tag |
obscured |
I think this is because the CSV import page could be clearer. Specifically, it’s not obvious to me if these columns with data types are:
- an example of the data types that must be used, shown by these sample columns
- the minimum set of columns that are required for a successful upload (but additional columns are permitted)
- the only eight columns that will upload (but you don’t need all eight)
- the only eight columns that will upload (and all eight are required)
I feel it would only take a small text tweak to make the rules crystal clear for everyone (I’m guessing 4 but I’m still not certain, even after much scanning of the forum).
As an aside, it seems odd that when you download a CSV file, the column headings are different from the ones required for uploading…
Does anyone else share my confusion? (Please don’t say it’s just me!)
2 Likes
I did some basic testing, and was able to discover a few things about the current behaviour of the CSV import. The following should only be treated as a rough guide, though, since I haven’t checked every possibility.
-
Almost everything is optional - the only exception being the first column, the value of which must resolve to a valid taxon. So a minimal working example file would be:
blah
Vanessa io
This will create a casual observation with only an ID and an auto-generated tag with the upload filename.
Note that the current implementation ignores the column names, but it will always read the header - so if it’s omitted, the first data row would be used as the header row instead (thus swallowing up that observation).
-
The order of the eight documented columns is crucial, but their number isn’t. All that matters is that the values in the data rows are correctly aligned with the appropriate header columns. The values must be separated by commas (including empty values) and double-quoted if they contain commas:
Taxon name,Date observed,Description,Place name
Vanessa io,,,"Paris, France"
-
The first eight columns are always interpreted as documented, and any additional columns will be silently ignored - except when adding to a traditional project with suggested/required observation fields. For the latter case, each additional column must match the observation field name/value pairs specified by that project:
Taxon,Date,Desc,Place,Latitude,Longitude,Tags,Geoprivacy,HabitatType
Vanessa io,,,"Paris, France",,,,private,Mixed Forest
Note that the observation field columns would normally start at column nine. but strictly speaking, this isn’t necessary, since the current implementation looks them up by name rather than index. So unlike the first eight columns, the order of additional columns isn’t significant.
My own conclusion from this is that the current documentation seems okay. The implementation is very tolerant, so it’s not that easy to make mistakes. The only significant omission is the special treatment of additional columns.
PS: I just noticed that the examples in the current documentation don’t quite follow the stated rules. The first two examples don’t match the formats of the date and lat/long columns, the last two examples use unnecessary quoting, and the second example has a missing final column. Even so, when those examples are uploaded verbatim as a CSV file, the expected observations are created without error - which I suppose is a nice demonstration of the tolerance of the implementation.
2 Likes
Thank you so much, bazwal, for your brilliant detective work! So helpful and exactly the clarity I’ve been looking for.
I’ll follow up on your findings and post again once I have. :)
After the great work done by Baswal, I’ve done some more uploading and testing and come up with some suggested text (below) for modifying the CSV upload page.
It’s very easy to forget the knowledge one already has so I’m trying to make the copy as clear as possible with the minimum of assumed knowledge.
(I’m not sure why duplicate file names are discouraged. It would be great if someone could confirm my finding that duplicate named files get uploaded OK but are added to, rather than overwrite, the original upload.)
Rules & Formatting
-
Up to 8 columns of data can be imported (more if uploading to a traditional project*).
-
Data must be uploaded in the following format:
Taxon |
Date observed |
Description |
Place name |
Latitude: y coordinate & northing |
Longitude: x coordinate & easting |
Tags |
Geoprivacy |
text |
YYY-MM-DD HH:MM (12 or 24 hour clock, if included) |
text |
text |
+/-yyy.nnnn |
+/- xxx.eeee |
text,text |
text |
-
You must have a header row but you can choose your own names for columns (e.g. you might prefer “Lat” to “Latitude: y coordinate & northing”).
-
Only column 1 (taxon) is mandatory; other columns can be left blank but the order of the columns must be as shown (e.g. latitude data must always be in column 5, etc).
-
The taxon name must match an existing taxon in our database (e.g. “insects” or “lepidoptera” or “Aglais urticae”).
-
Any text with commas must be enclosed with double-quotes (most spreadsheet applications will automatically export CSV in this format).
-
Don’t use double quotes anywhere else.
-
Use unique file names for separate uploads (files with duplicate names will upload but not overwrite).
-
The geoprivacy column must be in English and must be blank or have a value of “obscured” or “private”.
-
Only files with fewer than 10,000 rows, please.
Here are 3 examples that will upload successfully:
a) Anna’s Hummingbird,2008-03-03 2:54pm,“An aggressive male dive-bombed my head, so I took cover.”,“Tilden Regional Park, Berkeley, CA, USA”,37.8953,-122.249,“attack, danger”,obscured
b) Sharp-tailed Snake,2007-08-20,“Beautiful little creature”,“Leona Canyon Regional Park, Oakland, CA, USA”,37.7454,-122.111,“cute, snakes”
c) Golden Eagle,“I’m not really sure when or where this was”,“mysterious”,private
*Importing to Traditional Projects
-
If you are importing into a traditional project that uses observation fields (either suggested or required), then extra columns can be added (as columns 9,10,11, etc).
-
The header of each additional column must match the name of one of the project’s observation fields (names are case sensitive).
-
These additional columns can be placed in any order (because, unlike the first eight columns, observation columns are matched by name, not position).
What do you think, is it an improvement?
Very happy for others to chip in to hone it further!
2 Likes