import the VernacularNames-XXX.csv files into a new table (data from all these files goes into a single table).
(optional) index tables
run SQL queries as desired
i haven’t tried (yet), but i wouldn’t expect it to take more than 10 or 15 minutes to accomplish #1-6 above. (and if i saved a few scripts from that process, it would be even faster to do in the future.) it should be possible to do similar things in R or Python, etc.
for reference, VernacularNames files include these fields:
taxon_id (numeric ID)
vernacularName
language (code)
locality (name)
countryCode
source (description)
lexicon (name)
contributor (name)
created (UTC datetime)
and the taxa file contains these fields:
id (numeric ID)
taxonID (iNat taxon page URL)
identifier (iNat taxon page URL)
parentNameUsageID (iNat taxon page URL)
kingdom
phylum
class
order
family
genus
specificEpithet
infraspecificEpithet
modified (UTC datetime)
scientificName
taxonRank
references
the main limitation of this approach is that it would not be real-time. also, the export may include only active taxa. and, of course, not everyone may have the technical experience needed to work with these files.
UPDATE: i created a local database using the data from latest iNat taxonomy export file, and here’s an example of query that i ran on it:
(if it would be helpful, i could probably load a snapshot of the latest dataset to data.world so that it could be easily accessed using whatever technology you’re most comfortable with. just let me know.)
Species without common names for a specified lexicon and a specific country (=checklist?) and common names that occur for more then one species within a lexicon:
SELECT vnames.vernacularName, vnames.contributor, Taxa.scientificName, vnames.locality, vnames.source, vnames.lexicon, vnames.created, Taxa.taxonID FROM Taxa INNER JOIN vnames ON Taxa.id = vnames.id
*WHERE * ((
*(vnames.vernacularName) In *
(SELECT vnames.vernacularName FROM Taxa INNER JOIN vnames ON Taxa.id = vnames.id WHERE taxonRank= ‘Species’ GROUP BY [vernacularName] * HAVING (Count()>1 ))
)) ORDER BY vnames.vernacularName;
Interessting file, you can also find out which lexicon has the most common names after English and which user has the first position in the leader board of adding common names…
i think this more accurately reflects the above description:
WITH vnames_multiple_species AS (
SELECT X.vernacularName, X.lexicon, Y.taxonRank
FROM vnames X
JOIN taxa Y
ON X.id = Y.id
-- WHERE Y.taxonRank = 'species'
GROUP BY X.vernacularName, X.lexicon, Y.taxonRank
HAVING COUNT(DISTINCT X.id) > 1
)
SELECT B.taxonRank, B.scientificName, A.*
FROM vnames A
JOIN taxa B
ON A.id = B.id
WHERE EXISTS (
SELECT *
FROM vnames_multiple_species C
WHERE A.vernacularName = C.vernacularName
AND A.lexicon = C.lexicon
AND B.taxonRank = C.taxonRank
AND C.taxonRank = 'species' )
ORDER BY B.taxonRank, A.vernacularName, B.scientificName
Not for MS Access:
As others have stated in the comments, Access SQL does not support the with keyword the same way that TSQL does. You can accomplish close to the same thing though by writing the first query and saving it. The saved query can then be referenced in your Access SQL as though it was a table (similar to creating a view in TSQL). https://stackoverflow.com/questions/24970777/how-do-i-use-the-with-statement-in-access
ok. i didn’t realize you were working in Access. Access doesn’t support common table expressions nor count(distinct), and it optimizes queries differently. so i think this would be how i would write the query for Access:
SELECT B.scientificName, A.*
FROM (
SELECT D.*, C.taxonRank
FROM (
SELECT vernacularName, lexicon, taxonRank
FROM (
SELECT DISTINCT X.vernacularName, X.lexicon, Y.taxonRank, X.id
FROM vnames AS X
INNER JOIN taxa AS Y
ON X.id = Y.id
WHERE Y.taxonRank = 'species'
) AS Z
GROUP BY vernacularName, lexicon, taxonRank
HAVING COUNT(*) > 1
) AS C
INNER JOIN vnames AS D
ON D.vernacularName = C.vernacularName
AND D.lexicon = C.lexicon
) AS A
INNER JOIN taxa AS B
ON A.id = B.id
AND A.taxonRank = B.taxonRank
ORDER BY A.taxonRank, A.vernacularName, B.scientificName
it’s not as easy to read (in my opinion) than my earlier query, but i think it accomplishes the same thing (though i didn’t really check results). you could try to recreate this query using nested Access queries, but i’m counting 3 nests here. so it’s probably just easier to open up SQL view in the Query Designer, and write the query directly in there (or copy and paste this).
if you compare this query against the one you wrote, i think you’ll see a few differences. just based on your earlier description of your query, i think this query will more accurately match your description.
you could try indexing, but i suspect there’s enough data here that you might be operating at the file size limit for Access (in which case, indexing probably won’t help).
you could try reducing the number of records that you import into your database. (for example, only upload the vernacular names for the lexicons that you’re really interested in.)
otherwise, you probably need to use a different database. unless i need server-client architecture, i tend to use SQLite to make simple databases. if you can write SQL on your own (without using a visual designer), then i think DB Browser is a really good application to manage SQLite databases. it has a very simple interface and includes good functionality for importing data.
:-) It’s like going to China and learning Quechua.
I imported only one (the most important) lexicon but all taxa of iNaturalist. If I should import all lexicons I think I should script something in VBA or Python to do it. I did not find another database my company let me install on my laptop but I will take a look at SQLLite.
hmmm… if you imported only one lexicon into your Access database and the query still didn’t return anything for you (in a reasonable amount of time), then it’s probably that your computer doesn’t have the resources to handle a big data set like that in Access. i just tried importing taxa and English vnames, and it ran in just few seconds in Access 365 (even without any indexes).
that said, if your query is working for you, it should get you close enough to your intent probably, since your set of vnames includes only one lexicon. so if that’s working for you, then it’s probably good enough in this case.
…
i did a quick web search to refresh my memory about how to import multiple files into an Access table, and it looks like it probably does require some VBA, which is unfortunate. (in DB Browser, i just have to select all the vname csv files at the same time, and the import wizard will load them all into the same table. it takes just a minute or two.)
since you mentioned you can do Python, and since it sounds like you can’t install new apps on your machine, you still might have the ability to run SQL using the sqlite3 module in Python. SQL queries should run faster / more efficiently in SQLite than in Access, but going through Python, it’s more coding to accomplish some of the steps vs going through a DB app. (you can also do SQL-like queries in Python itself without going through SQL, but it’s just ugly to read compared to SQL, in my opinion.)