When uploading an image to the iNaturalist website that contains a non-English description metadata, the text imported into the “Note” field appears misencoded.
Steps to Reproduce
Take a photo and set the EXIF “rdf:Description” metadata (named Title in Windows) to a non-English string (e.g., in Cyrillic, Chinese, or other non-Latin charset).
Upload the photo to the iNaturalist website via the standard upload interface.
Check the “Note” field associated with the uploaded image.
Expected Behavior
The non-English “rdf:Description” should be imported into the “Note” field correctly, preserving the original characters.
Observed Behavior
The text in the “Note” field is garbled, misencoded, or replaced with unrelated characters.
Additional Notes
The “rdf:Description” is UTF-8 by default, but it appears to be handled as if it were ANSI.
Have any steps been taken to address this bug? It’s quite surprising that in 2025, the site still does not handle UTF-8 correctly - something that has been standard for over two decades. Would you recommend that I open an issue on GitHub regarding this?
I just tested this, and it works fine for me. Are you sure the tool you’re using to edit the image metadata isn’t the problem? I used exiftool to add the relevant tags. To be clear, these are the XMP dc:description and dc:title tags, as defined here: XMP - Dublin Core Schema.
It would help if you could post the complete XMP data from an example image that demonstrates the problem (i.e. as produced by the tool you use for editing the title/description). In particular, it would be interesting to see the values of the xml:lang attributes (if any).
The exact exiftool command I used for setting the tags was:
PS: one thing I forgot to ask is what your system locale settings are. I don’t use Windows myself, but my understanding is that its default encoding isn’t UTF-8. Have you explicitly changed the settings yourself? The garbled text shown in your image looks very similar to what would be produced by decoding UTF-8 encoded text using the Windows cp1252 encoding. The locale on my Linux system is LANG=en_GB.UTF-8, which perhaps explains why I don’t have the problems you have.
I attempted to set the description metadata using both Adobe Photoshop’s File Info tool and the standard Windows file property editor, both of which, to my understanding, generate de-facto standard XMP metadata. Locale is en-US. However, neither method was successful.
In production, I use Adobe Lightroom along with Jeffrey Metadata Wrangler plugin, which reliably produces metadata that displays correctly in both Photoshop and Windows. Despite this, the metadata still fail to import properly on iNaturalist.
– Here I clean all metadata in the reference image
exiftool.exe -all= -overwrite_original test-noexif.jpg
copy test-noexif.jpg test-photoshop.jpg
copy test-noexif.jpg test-windows.jpg
– Then I edit test-photoshop.jpg with Adobe Photoshop → File Info and Save
exiftool.exe -xmp -w txt -b test-photoshop.jpg
– Then I edit test-windows.jpg with RightClick/Properties and Apply changes
exiftool.exe -xmp -w txt -b test-windows.jpg
The relevant tags are all encoded as UTF-8, but having now looked at the iNaturalist source code, it seems the raw bytes of some tags are just treated as ASCII text - regardless of the encoding - and all the code does is strip out any null bytes. This approach is strictly correct, since the affected tags are defined by the EXIF standard as only accepting ASCII, and should never contain data that uses any other encoding (including UTF-8).
I had a look at the metadata of your image files, and found that the title/description/subject isn’t only written to the XMP section - there’s also three other tags added in the EXIF section. The one that’s causing the problem is the ImageDescription tag. Once that’s removed, the uploader reads everything else correctly. The two other EXIF tags are XPTitle and XPSubject. These are Windows-specific, but according to the spec they can be encoded using either UTF-8 or ANSI, and have the data type int8u. By contrast, the ImageDescription tag has a data type of string, which only supports ASCII (see: Standard Exif Tags).
So the bug here is that UTF-8 encoded data is being written to a tag which does not really support it. Given that, it’s not unreasonable for the uploader to refuse to guess the encoding and just treat the data as ASCII text. It uses a third-party library to load the exif (piexifjs), so it would be understandable if the current behaviour wasn’t changed. In the meantime, I suppose the ideal work-around from your end would be to configure your image metadata editing software to just never write to the ImageDescription tag (assuming that’s possible).
or, for directories: exiftool -overwrite_original -EXIF:ImageDescription= {DIR}
However, it would be helpful if the iNaturalist team could ensure that the uploader is compatible with modern commercial software that supports international encodings (such as Unicode and UTF-8) by default. The Windows-1252 hasn’t been around in over a decade.
All of that software is buggy. It should never write anything other than ASCII to the ImageDescription tag. Here’s how it’s defined by the EXIF Standard:
ImageDescription
A character string giving the title of the image. It is possible to be added a comment such as “1988 company picnic” or the like. Two-byte character codes cannot be used. When a 2-byte code is necessary, the Exif Private tag UserComment is to be used.
Tag = 270 (10E.H)
Type = ASCII
Count = Any
Default = None
Many image metadata editors simply ignore the spec and write data using whatever encoding they like. This is probably to avoid breaking other legacy/obsolete software that expects the tag to be there.
I suppose iNaturalist could try to work-around these third-party bugs by attempting to guess the non-ASCII encoding used in the ImageDescription tag. There are various tools available for this task, by none of them are guaranteed to guess correctly - especially when there’s only a small input sample to work with (which is almost always the case with metadata tags). If the iNaturalist devs choose not to go down this path, I wouldn’t blame them.
An alternative way to avoid all these problems is to simply never use “modern” software that writes buggy metadata without giving you the option control the output
With the release of EXIF 3.0, the standard was updated to allow UTF-8 encoding in certain tags, including ImageDescription. This change reflects the modern need for broader character set support, particularly for non-English languages, and aligns better with Unicode-based systems.
A character string giving the content description of the image. It is possible to add a description of the content or comment such as “1988 company picnic” or the like. 2-byte or larger character codes can not be used when ASCII is set as Type. When a 2-byte or larger code is necessary, UTF-8 shall be set as Type. The count is the value including NULL terminations.
Nobody said that using UTF-8 in EXIF per se is a bug - there are many other tags that already explicitly support it. It’s fair to point out that a new standard was recently published, but the fact remains that a lot of software violated the earlier standard by using literally any non-ASCII encoding (not just UTF-8) in the ImageDescription tag. This means there will continue to be many photos around with metadata that is encoded as neither ASCII nor UTF-8 (for example, UTF-16 and many other two-byte encodings). It’s also very likely that there will be a lot of software around that continues to read and write EXIF metadata using the older standards. Just because a new standard becomes available, doesn’t guarantee that everyone will immediately start complying with it.
It wouldn’t be difficult for iNaturalist to add support for UTF-8 only, and avoid guessing any other encodings. Decoding invalid UTF-8 fails quite reliably, so it could simply fall back to ASCII if an error occurs. I still wouldn’t blame the devs if they chose to just leave things as they are, though.
PS: I just checked your example photos, and sure enough, the metadata is written using version 2.31 (i.e. Exif Version = 0231). So your Photoshop software does write buggy metadata, since it doesn’t fully comply with the actual standard it uses.
It might be a good idea to prioritize the use of embedded XMP metadata when available, as it is internationalized by design.
If XMP data is not present, fallback to EXIF metadata, which is more device-specific (maintained by camera manufacturers such as Sony, Nikon, Canon, Ricoh, Fuji, Apple, etc) and often includes legacy.
Personally, I’m not a fan of guessing the encoding, as it introduces unpredictability and potential inconsistencies in metadata interpretation.
But we already have unpredictability and inconsistency, so if you want universal UTF-8 support, some form of guessing is inescapable. A large proportion of photos don’t include XMP metadata, so the EXIF tags will have to be interpreted somehow. Attempting to decode as UTF-8 and falling back to ASCII on failure is a simple and effective way of supporting both the current and new EXIF standards. It’s not perfect, but it would be a definite improvement.