Missing 'references' column values for ~300k species from "iNaturalist Taxonomy DarwinCore Archive" taxonomy tree download

I am probably just not understanding something, but wanted to flag incase this is actually a data processing issue.

For around ~300k species in the iNat taxonomic tree (see taxa.csv file from “iNaturalist Taxonomy DarwinCore Archive” download link here: inaturalist . org/pages/developers) the reference links to other databases (GBIF, etc.) are missing making it hard to understand where the data is coming from. Spot checking some of these values on the site reveals that these reference links often exist on the specific iNaturalist species pages, but somehow don’t make it into the download zip file.

Example - Pereute charops (id: 258185)

Looking at the schemes page for that species I see there are two references to GBIF and CONABIO. https://www.inaturalist.org/taxa/258185/schemes

If i use the provided GBIF id and look up Pereute charops I get this reference: https://www.gbif.org/species/1919235

So the reference link exists and is listed on the iNaturalist page, but does not show up in the download (see screenshot below and code further down to reproduce)

Example - Psylliodes brettinghami
Sorry it wont let me post more than 4 full links, so Ids below only
iNaturalist ID: 395957
GBIF ID: 4731403

Additional examples
[862702, 1150910,1509734, 258185, 842627, 382650, 1146996, 921832, 395957, 140082]

Code to reproduce full list (im on windows)

from urllib.request import urlopen
from io import BytesIO
from zipfile import ZipFile
import pandas as pd
import os
import requests

current_directory = os.getcwd()
extract_to = f’{current_directory}\download’
os.makedirs(extract_to, exist_ok=True)
zip_file_url = ‘https://www.inaturalist.org/taxa/inaturalist-taxonomy.dwca.zip

http_response = urlopen(zip_file_url)
zipfile = ZipFile(BytesIO(http_response.read()))
zipfile.extractall(path=extract_to)

file_path = f’{extract_to}\\taxa.csv’
df = pd.read_csv(file_path)

df[df[‘references’].isna() & (df[‘taxonRank’] == ‘species’)].sort_values(by=[‘kingdom’, ‘scientificName’])
cnt = len(df[df[‘references’].isna() & (df[‘taxonRank’] == ‘species’)])
print(f’Number of rows missing links in the reference column: {str(cnt)}')

random_list = df[df[‘references’].isna() & (df[‘taxonRank’] == ‘species’)][‘id’].sample(n=10, random_state=1).tolist()
df[df[‘id’].isin(random_list)].sort_values(by=‘kingdom’)