Using SQL to query iNat's DWCA taxonomy export

it’s not necessarily an easy thing to do, but you could use the data in the the taxonomy DWCA export (https://www.inaturalist.org/taxa/inaturalist-taxonomy.dwca.zip) as the source for searching common names and lexicons.

for me, i think the simplest process would be:

  1. download the file
  2. extract the files
  3. create a new SQLite database
  4. import taxa.csv as a new table
  5. import the VernacularNames-XXX.csv files into a new table (data from all these files goes into a single table).
  6. (optional) index tables
  7. run SQL queries as desired

i haven’t tried (yet), but i wouldn’t expect it to take more than 10 or 15 minutes to accomplish #1-6 above. (and if i saved a few scripts from that process, it would be even faster to do in the future.) it should be possible to do similar things in R or Python, etc.

for reference, VernacularNames files include these fields:

  • taxon_id (numeric ID)
  • vernacularName
  • language (code)
  • locality (name)
  • countryCode
  • source (description)
  • lexicon (name)
  • contributor (name)
  • created (UTC datetime)

and the taxa file contains these fields:

  • id (numeric ID)
  • taxonID (iNat taxon page URL)
  • identifier (iNat taxon page URL)
  • parentNameUsageID (iNat taxon page URL)
  • kingdom
  • phylum
  • class
  • order
  • family
  • genus
  • specificEpithet
  • infraspecificEpithet
  • modified (UTC datetime)
  • scientificName
  • taxonRank
  • references

the main limitation of this approach is that it would not be real-time. also, the export may include only active taxa. and, of course, not everyone may have the technical experience needed to work with these files.

UPDATE: i created a local database using the data from latest iNat taxonomy export file, and here’s an example of query that i ran on it:

(if it would be helpful, i could probably load a snapshot of the latest dataset to data.world so that it could be easily accessed using whatever technology you’re most comfortable with. just let me know.)

4 Likes

Thank you for pointing to the download. I found the culprit.

1 Like

I’d also like to be able to search for species without common names.

2 Likes

this is also possible by querying against the data in the DWCA export file.

for example, this query:

SELECT A.scientificName, A.taxonID
FROM taxa A
LEFT JOIN vnames B
	ON A.id = B.id
WHERE taxonRank = 'species'
	AND genus = 'Asclepias'
	AND B.id is NULL

returns:

scientificName taxonID
Asclepias mcvaughii https://www.inaturalist.org/taxa/207411
Asclepias circinalis https://www.inaturalist.org/taxa/277429
Asclepias jorgeana https://www.inaturalist.org/taxa/277432
Asclepias mirifica https://www.inaturalist.org/taxa/277434
Asclepias pratensis https://www.inaturalist.org/taxa/277436
Asclepias pseudorubricaulis https://www.inaturalist.org/taxa/277437
Asclepias scheryi https://www.inaturalist.org/taxa/277438
Asclepias subaphylla https://www.inaturalist.org/taxa/277439
Asclepias vinosa https://www.inaturalist.org/taxa/277440
Asclepias virletii https://www.inaturalist.org/taxa/277441
Asclepias nummularioides https://www.inaturalist.org/taxa/277619
Asclepias zanthodacryon https://www.inaturalist.org/taxa/277621
Asclepias atroviolacea https://www.inaturalist.org/taxa/284334
Asclepias conzattii https://www.inaturalist.org/taxa/284336
Asclepias coulteri https://www.inaturalist.org/taxa/284337
Asclepias crocea https://www.inaturalist.org/taxa/284338
Asclepias fournieri https://www.inaturalist.org/taxa/284340
Asclepias gentryi https://www.inaturalist.org/taxa/284341
Asclepias masonii https://www.inaturalist.org/taxa/284343
Asclepias pellucida https://www.inaturalist.org/taxa/284345
Asclepias puberula https://www.inaturalist.org/taxa/284346
Asclepias schaffneri https://www.inaturalist.org/taxa/284348
Asclepias standleyi https://www.inaturalist.org/taxa/284349
Asclepias bifida https://www.inaturalist.org/taxa/284616
Asclepias leptopus https://www.inaturalist.org/taxa/284619
Asclepias pringlei https://www.inaturalist.org/taxa/284622
Asclepias kamerunensis https://www.inaturalist.org/taxa/437303
Asclepias elegantula https://www.inaturalist.org/taxa/503712
Asclepias barjoniifolia https://www.inaturalist.org/taxa/542965
Asclepias adscendens https://www.inaturalist.org/taxa/580329
Asclepias bicuspis https://www.inaturalist.org/taxa/580331
Asclepias brevicuspis https://www.inaturalist.org/taxa/580332
Asclepias compressidens https://www.inaturalist.org/taxa/580334
Asclepias concinna https://www.inaturalist.org/taxa/580335
Asclepias cooperi https://www.inaturalist.org/taxa/580336
Asclepias crassinervis https://www.inaturalist.org/taxa/580337
Asclepias disparilis https://www.inaturalist.org/taxa/580341
Asclepias dissona https://www.inaturalist.org/taxa/580342
Asclepias expansa https://www.inaturalist.org/taxa/580344
Asclepias fallax https://www.inaturalist.org/taxa/580345
Asclepias fulva https://www.inaturalist.org/taxa/580347
Asclepias gordon-grayae https://www.inaturalist.org/taxa/580349
Asclepias meliodora https://www.inaturalist.org/taxa/580353
Asclepias meyeriana https://www.inaturalist.org/taxa/580354
Asclepias montevaga https://www.inaturalist.org/taxa/580355
Asclepias monticola https://www.inaturalist.org/taxa/580356
Asclepias nana https://www.inaturalist.org/taxa/580358
Asclepias navicularis https://www.inaturalist.org/taxa/580359
Asclepias oreophila https://www.inaturalist.org/taxa/580360
Asclepias patens https://www.inaturalist.org/taxa/580361
Asclepias peltigera https://www.inaturalist.org/taxa/580362
Asclepias praemorsa https://www.inaturalist.org/taxa/580363
Asclepias randii https://www.inaturalist.org/taxa/580364
Asclepias rara https://www.inaturalist.org/taxa/580365
Asclepias schlechteri https://www.inaturalist.org/taxa/580366
Asclepias ulophylla https://www.inaturalist.org/taxa/580368
Asclepias velutina https://www.inaturalist.org/taxa/580369
Asclepias vicaria https://www.inaturalist.org/taxa/580370
Asclepias woodii https://www.inaturalist.org/taxa/580371
Asclepias dregeana https://www.inaturalist.org/taxa/596584
Asclepias melantha https://www.inaturalist.org/taxa/622354
Asclepias alpestris https://www.inaturalist.org/taxa/625237
Asclepias amabilis https://www.inaturalist.org/taxa/625238
Asclepias ameliae https://www.inaturalist.org/taxa/625239
Asclepias candida https://www.inaturalist.org/taxa/625264
Asclepias dependens https://www.inaturalist.org/taxa/625274
Asclepias flanaganii https://www.inaturalist.org/taxa/625275
Asclepias graminifolia https://www.inaturalist.org/taxa/625276
Asclepias grandirandii https://www.inaturalist.org/taxa/625277
Asclepias foliosa https://www.inaturalist.org/taxa/625526
Asclepias inaequalis https://www.inaturalist.org/taxa/625528
Asclepias langsdorffii https://www.inaturalist.org/taxa/625530
Asclepias longirostra https://www.inaturalist.org/taxa/625539
Asclepias longissima https://www.inaturalist.org/taxa/625542
Asclepias minutiflora https://www.inaturalist.org/taxa/625560
Asclepias mtorwiensis https://www.inaturalist.org/taxa/625563
Asclepias nuttii https://www.inaturalist.org/taxa/625572
Asclepias palustris https://www.inaturalist.org/taxa/625582
Asclepias pilgeriana https://www.inaturalist.org/taxa/625584
Asclepias pseudoamabilis https://www.inaturalist.org/taxa/625586
Asclepias pseudofimbriata https://www.inaturalist.org/taxa/625588
Asclepias pygmaea https://www.inaturalist.org/taxa/625593
Asclepias senecionifolia https://www.inaturalist.org/taxa/860958
Asclepias fimbriata https://www.inaturalist.org/taxa/1104559
Asclepias boliviensis https://www.inaturalist.org/taxa/1159860
Asclepias schumanniana https://www.inaturalist.org/taxa/1188941
1 Like

Species without common names for a specified lexicon and a specific country (=checklist?) and common names that occur for more then one species within a lexicon:

SELECT vnames.vernacularName, vnames.contributor, Taxa.scientificName, vnames.locality, vnames.source, vnames.lexicon, vnames.created, Taxa.taxonID
FROM Taxa INNER JOIN vnames ON Taxa.id = vnames.id
*WHERE *
((
*(vnames.vernacularName) In *

(SELECT vnames.vernacularName
FROM Taxa INNER JOIN vnames ON Taxa.id = vnames.id
WHERE taxonRank= ‘Species’
GROUP BY [vernacularName] *
HAVING (Count()>1 ))

))
ORDER BY vnames.vernacularName;

Interessting file, you can also find out which lexicon has the most common names after English and which user has the first position in the leader board of adding common names…


Postscript:
There is a policy for common names and genus and infraspecies. I was wondering what to do with this so I removed them from the result of this query, but that is not correct…
https://forum.inaturalist.org/t/duplicating-common-name-for-nominate-subspecies/11298/8?u=optilete

1 Like

i think this more accurately reflects the above description:

WITH vnames_multiple_species AS (
	SELECT X.vernacularName, X.lexicon, Y.taxonRank
	FROM vnames X
	JOIN taxa Y
		ON X.id = Y.id
	-- WHERE Y.taxonRank = 'species'
	GROUP BY X.vernacularName, X.lexicon, Y.taxonRank
	HAVING COUNT(DISTINCT X.id) > 1
)
SELECT B.taxonRank, B.scientificName, A.*
FROM vnames A
JOIN taxa B
	ON A.id = B.id
WHERE EXISTS (
    SELECT *
	FROM vnames_multiple_species C
	WHERE A.vernacularName = C.vernacularName
		AND A.lexicon = C.lexicon
		AND B.taxonRank = C.taxonRank
		AND C.taxonRank = 'species' )
ORDER BY B.taxonRank, A.vernacularName, B.scientificName
1 Like

(I moved some posts here since they seemed more in the vein of a tutorial, feel free to edit the title or anything else)

Not for MS Access:
As others have stated in the comments, Access SQL does not support the with keyword the same way that TSQL does. You can accomplish close to the same thing though by writing the first query and saving it. The saved query can then be referenced in your Access SQL as though it was a table (similar to creating a view in TSQL).
https://stackoverflow.com/questions/24970777/how-do-i-use-the-with-statement-in-access

1 Like

ok. i didn’t realize you were working in Access. Access doesn’t support common table expressions nor count(distinct), and it optimizes queries differently. so i think this would be how i would write the query for Access:

SELECT B.scientificName, A.*
FROM (
	SELECT D.*, C.taxonRank
	FROM (
		SELECT vernacularName, lexicon, taxonRank
		FROM (
			SELECT DISTINCT X.vernacularName, X.lexicon, Y.taxonRank, X.id
			FROM vnames AS X
			INNER JOIN taxa AS Y
				ON X.id = Y.id
			WHERE Y.taxonRank = 'species'
		) AS Z
		GROUP BY vernacularName, lexicon, taxonRank
		HAVING COUNT(*) > 1
	) AS C
	INNER JOIN vnames AS D
		ON D.vernacularName = C.vernacularName
		AND D.lexicon = C.lexicon
) AS A
INNER JOIN taxa AS B
	ON A.id = B.id
	AND A.taxonRank = B.taxonRank
ORDER BY A.taxonRank, A.vernacularName, B.scientificName

it’s not as easy to read (in my opinion) than my earlier query, but i think it accomplishes the same thing (though i didn’t really check results). you could try to recreate this query using nested Access queries, but i’m counting 3 nests here. so it’s probably just easier to open up SQL view in the Query Designer, and write the query directly in there (or copy and paste this).

if you compare this query against the one you wrote, i think you’ll see a few differences. just based on your earlier description of your query, i think this query will more accurately match your description.

1 Like

I should add indexes, it took over 2 hours and I still don’t get an answer with this Q.

you could try indexing, but i suspect there’s enough data here that you might be operating at the file size limit for Access (in which case, indexing probably won’t help).

you could try reducing the number of records that you import into your database. (for example, only upload the vernacular names for the lexicons that you’re really interested in.)

otherwise, you probably need to use a different database. unless i need server-client architecture, i tend to use SQLite to make simple databases. if you can write SQL on your own (without using a visual designer), then i think DB Browser is a really good application to manage SQLite databases. it has a very simple interface and includes good functionality for importing data.

1 Like

:-) It’s like going to China and learning Quechua.

I imported only one (the most important) lexicon but all taxa of iNaturalist. If I should import all lexicons I think I should script something in VBA or Python to do it. I did not find another database my company let me install on my laptop but I will take a look at SQLLite.

1 Like

hmmm… if you imported only one lexicon into your Access database and the query still didn’t return anything for you (in a reasonable amount of time), then it’s probably that your computer doesn’t have the resources to handle a big data set like that in Access. i just tried importing taxa and English vnames, and it ran in just few seconds in Access 365 (even without any indexes).

that said, if your query is working for you, it should get you close enough to your intent probably, since your set of vnames includes only one lexicon. so if that’s working for you, then it’s probably good enough in this case.

i did a quick web search to refresh my memory about how to import multiple files into an Access table, and it looks like it probably does require some VBA, which is unfortunate. (in DB Browser, i just have to select all the vname csv files at the same time, and the import wizard will load them all into the same table. it takes just a minute or two.)

since you mentioned you can do Python, and since it sounds like you can’t install new apps on your machine, you still might have the ability to run SQL using the sqlite3 module in Python. SQL queries should run faster / more efficiently in SQLite than in Access, but going through Python, it’s more coding to accomplish some of the steps vs going through a DB app. (you can also do SQL-like queries in Python itself without going through SQL, but it’s just ugly to read compared to SQL, in my opinion.)

1 Like

I was afraid for it so I tried an ms dos line that does the trick: copy VernacularNames-*.csv VernacularNames_all\VernacularNames_all.txt /on

1 Like

smart

1 Like

After all this work I guess a GUI is implemented to browse the history for each taxon…


Probably another history, accoording to the error message
‘‘History was introduced in February 2022, so no history is available prior to that date.’’

Results of DWCA can be found:

The state of common names feb 2023
Finding out common names’ numbers per language 2020 & jan 2023
Clean up currently available lexicons


and for the post above, I cannot edit??
taxon history is live now