Using SQL to query iNat's DWCA taxonomy export

it’s not necessarily an easy thing to do, but you could use the data in the the taxonomy DWCA export ( as the source for searching common names and lexicons.

for me, i think the simplest process would be:

  1. download the file
  2. extract the files
  3. create a new SQLite database
  4. import taxa.csv as a new table
  5. import the VernacularNames-XXX.csv files into a new table (data from all these files goes into a single table).
  6. (optional) index tables
  7. run SQL queries as desired

i haven’t tried (yet), but i wouldn’t expect it to take more than 10 or 15 minutes to accomplish #1-6 above. (and if i saved a few scripts from that process, it would be even faster to do in the future.) it should be possible to do similar things in R or Python, etc.

for reference, VernacularNames files include these fields:

  • taxon_id (numeric ID)
  • vernacularName
  • language (code)
  • locality (name)
  • countryCode
  • source (description)
  • lexicon (name)
  • contributor (name)
  • created (UTC datetime)

and the taxa file contains these fields:

  • id (numeric ID)
  • taxonID (iNat taxon page URL)
  • identifier (iNat taxon page URL)
  • parentNameUsageID (iNat taxon page URL)
  • kingdom
  • phylum
  • class
  • order
  • family
  • genus
  • specificEpithet
  • infraspecificEpithet
  • modified (UTC datetime)
  • scientificName
  • taxonRank
  • references

the main limitation of this approach is that it would not be real-time. also, the export may include only active taxa. and, of course, not everyone may have the technical experience needed to work with these files.

UPDATE: i created a local database using the data from latest iNat taxonomy export file, and here’s an example of query that i ran on it:

(if it would be helpful, i could probably load a snapshot of the latest dataset to so that it could be easily accessed using whatever technology you’re most comfortable with. just let me know.)


Thank you for pointing to the download. I found the culprit.

1 Like

I’d also like to be able to search for species without common names.


this is also possible by querying against the data in the DWCA export file.

for example, this query:

SELECT A.scientificName, A.taxonID
FROM taxa A
LEFT JOIN vnames B
	ON =
WHERE taxonRank = 'species'
	AND genus = 'Asclepias'


scientificName taxonID
Asclepias mcvaughii
Asclepias circinalis
Asclepias jorgeana
Asclepias mirifica
Asclepias pratensis
Asclepias pseudorubricaulis
Asclepias scheryi
Asclepias subaphylla
Asclepias vinosa
Asclepias virletii
Asclepias nummularioides
Asclepias zanthodacryon
Asclepias atroviolacea
Asclepias conzattii
Asclepias coulteri
Asclepias crocea
Asclepias fournieri
Asclepias gentryi
Asclepias masonii
Asclepias pellucida
Asclepias puberula
Asclepias schaffneri
Asclepias standleyi
Asclepias bifida
Asclepias leptopus
Asclepias pringlei
Asclepias kamerunensis
Asclepias elegantula
Asclepias barjoniifolia
Asclepias adscendens
Asclepias bicuspis
Asclepias brevicuspis
Asclepias compressidens
Asclepias concinna
Asclepias cooperi
Asclepias crassinervis
Asclepias disparilis
Asclepias dissona
Asclepias expansa
Asclepias fallax
Asclepias fulva
Asclepias gordon-grayae
Asclepias meliodora
Asclepias meyeriana
Asclepias montevaga
Asclepias monticola
Asclepias nana
Asclepias navicularis
Asclepias oreophila
Asclepias patens
Asclepias peltigera
Asclepias praemorsa
Asclepias randii
Asclepias rara
Asclepias schlechteri
Asclepias ulophylla
Asclepias velutina
Asclepias vicaria
Asclepias woodii
Asclepias dregeana
Asclepias melantha
Asclepias alpestris
Asclepias amabilis
Asclepias ameliae
Asclepias candida
Asclepias dependens
Asclepias flanaganii
Asclepias graminifolia
Asclepias grandirandii
Asclepias foliosa
Asclepias inaequalis
Asclepias langsdorffii
Asclepias longirostra
Asclepias longissima
Asclepias minutiflora
Asclepias mtorwiensis
Asclepias nuttii
Asclepias palustris
Asclepias pilgeriana
Asclepias pseudoamabilis
Asclepias pseudofimbriata
Asclepias pygmaea
Asclepias senecionifolia
Asclepias fimbriata
Asclepias boliviensis
Asclepias schumanniana
1 Like

Species without common names for a specified lexicon and a specific country (=checklist?) and common names that occur for more then one species within a lexicon:

SELECT vnames.vernacularName, vnames.contributor, Taxa.scientificName, vnames.locality, vnames.source, vnames.lexicon, vnames.created, Taxa.taxonID
FROM Taxa INNER JOIN vnames ON =
*(vnames.vernacularName) In *

(SELECT vnames.vernacularName
FROM Taxa INNER JOIN vnames ON =
WHERE taxonRank= ‘Species’
GROUP BY [vernacularName] *
HAVING (Count()>1 ))

ORDER BY vnames.vernacularName;

Interessting file, you can also find out which lexicon has the most common names after English and which user has the first position in the leader board of adding common names…

There is a policy for common names and genus and infraspecies. I was wondering what to do with this so I removed them from the result of this query, but that is not correct…

1 Like

i think this more accurately reflects the above description:

WITH vnames_multiple_species AS (
	SELECT X.vernacularName, X.lexicon, Y.taxonRank
	FROM vnames X
	JOIN taxa Y
		ON =
	-- WHERE Y.taxonRank = 'species'
	GROUP BY X.vernacularName, X.lexicon, Y.taxonRank
SELECT B.taxonRank, B.scientificName, A.*
FROM vnames A
JOIN taxa B
	ON =
    SELECT *
	FROM vnames_multiple_species C
	WHERE A.vernacularName = C.vernacularName
		AND A.lexicon = C.lexicon
		AND B.taxonRank = C.taxonRank
		AND C.taxonRank = 'species' )
ORDER BY B.taxonRank, A.vernacularName, B.scientificName
1 Like

(I moved some posts here since they seemed more in the vein of a tutorial, feel free to edit the title or anything else)

Not for MS Access:
As others have stated in the comments, Access SQL does not support the with keyword the same way that TSQL does. You can accomplish close to the same thing though by writing the first query and saving it. The saved query can then be referenced in your Access SQL as though it was a table (similar to creating a view in TSQL).

1 Like

ok. i didn’t realize you were working in Access. Access doesn’t support common table expressions nor count(distinct), and it optimizes queries differently. so i think this would be how i would write the query for Access:

SELECT B.scientificName, A.*
	SELECT D.*, C.taxonRank
		SELECT vernacularName, lexicon, taxonRank
		FROM (
			SELECT DISTINCT X.vernacularName, X.lexicon, Y.taxonRank,
			FROM vnames AS X
				ON =
			WHERE Y.taxonRank = 'species'
		) AS Z
		GROUP BY vernacularName, lexicon, taxonRank
	) AS C
		ON D.vernacularName = C.vernacularName
		AND D.lexicon = C.lexicon
) AS A
	ON =
	AND A.taxonRank = B.taxonRank
ORDER BY A.taxonRank, A.vernacularName, B.scientificName

it’s not as easy to read (in my opinion) than my earlier query, but i think it accomplishes the same thing (though i didn’t really check results). you could try to recreate this query using nested Access queries, but i’m counting 3 nests here. so it’s probably just easier to open up SQL view in the Query Designer, and write the query directly in there (or copy and paste this).

if you compare this query against the one you wrote, i think you’ll see a few differences. just based on your earlier description of your query, i think this query will more accurately match your description.

1 Like

I should add indexes, it took over 2 hours and I still don’t get an answer with this Q.

you could try indexing, but i suspect there’s enough data here that you might be operating at the file size limit for Access (in which case, indexing probably won’t help).

you could try reducing the number of records that you import into your database. (for example, only upload the vernacular names for the lexicons that you’re really interested in.)

otherwise, you probably need to use a different database. unless i need server-client architecture, i tend to use SQLite to make simple databases. if you can write SQL on your own (without using a visual designer), then i think DB Browser is a really good application to manage SQLite databases. it has a very simple interface and includes good functionality for importing data.

1 Like

:-) It’s like going to China and learning Quechua.

I imported only one (the most important) lexicon but all taxa of iNaturalist. If I should import all lexicons I think I should script something in VBA or Python to do it. I did not find another database my company let me install on my laptop but I will take a look at SQLLite.

1 Like

hmmm… if you imported only one lexicon into your Access database and the query still didn’t return anything for you (in a reasonable amount of time), then it’s probably that your computer doesn’t have the resources to handle a big data set like that in Access. i just tried importing taxa and English vnames, and it ran in just few seconds in Access 365 (even without any indexes).

that said, if your query is working for you, it should get you close enough to your intent probably, since your set of vnames includes only one lexicon. so if that’s working for you, then it’s probably good enough in this case.

i did a quick web search to refresh my memory about how to import multiple files into an Access table, and it looks like it probably does require some VBA, which is unfortunate. (in DB Browser, i just have to select all the vname csv files at the same time, and the import wizard will load them all into the same table. it takes just a minute or two.)

since you mentioned you can do Python, and since it sounds like you can’t install new apps on your machine, you still might have the ability to run SQL using the sqlite3 module in Python. SQL queries should run faster / more efficiently in SQLite than in Access, but going through Python, it’s more coding to accomplish some of the steps vs going through a DB app. (you can also do SQL-like queries in Python itself without going through SQL, but it’s just ugly to read compared to SQL, in my opinion.)

1 Like

I was afraid for it so I tried an ms dos line that does the trick: copy VernacularNames-*.csv VernacularNames_all\VernacularNames_all.txt /on

1 Like


After all this work I guess a GUI is implemented to browse the history for each taxon…

Probably another history, accoording to the error message
‘‘History was introduced in February 2022, so no history is available prior to that date.’’