Non-English EXIF image description is not imported correctly into the Note field

dmka · December 21, 2024, 1:10pm

Platform: Website

Browser: Any (tested on Windows 11 Chrome)

URL: https://www.inaturalist.org/observations/upload

Description of problem

When uploading an image to the iNaturalist website that contains a non-English description metadata, the text imported into the “Note” field appears misencoded.

Steps to Reproduce

Take a photo and set the EXIF “rdf:Description” metadata (named Title in Windows) to a non-English string (e.g., in Cyrillic, Chinese, or other non-Latin charset).
Upload the photo to the iNaturalist website via the standard upload interface.
Check the “Note” field associated with the uploaded image.

Expected Behavior

The non-English “rdf:Description” should be imported into the “Note” field correctly, preserving the original characters.

Observed Behavior

The text in the “Note” field is garbled, misencoded, or replaced with unrelated characters.

Additional Notes

The “rdf:Description” is UTF-8 by default, but it appears to be handled as if it were ANSI.

tiwane · December 23, 2024, 11:09pm

Can you please send some of these photos to help@inaturalist.org?

dmka · December 24, 2024, 8:53am

I just sent two examples to help@inaturalist.org

dmka · January 8, 2025, 1:12pm

Tony, are there any updates on the issue? Last we discussed, you were able to reproduce it, but the developers were on Christmas holidays.

dmka · April 24, 2025, 11:00am

Have any steps been taken to address this bug? It’s quite surprising that in 2025, the site still does not handle UTF-8 correctly - something that has been standard for over two decades. Would you recommend that I open an issue on GitHub regarding this?

bazwal · April 24, 2025, 2:58pm

I just tested this, and it works fine for me. Are you sure the tool you’re using to edit the image metadata isn’t the problem? I used exiftool to add the relevant tags. To be clear, these are the XMP dc:description and dc:title tags, as defined here: XMP - Dublin Core Schema.

It would help if you could post the complete XMP data from an example image that demonstrates the problem (i.e. as produced by the tool you use for editing the title/description). In particular, it would be interesting to see the values of the xml:lang attributes (if any).

The exact exiftool command I used for setting the tags was:

exiftool -xmp:Description='Мой комментарий' -xmp:Title='Мой комментарий' test.jpg

which produces the following embedded XMP data:

<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
 <rdf:Description rdf:about=''
  xmlns:dc='http://purl.org/dc/elements/1.1/'>
  <dc:description>
   <rdf:Alt>
    <rdf:li xml:lang='x-default'>Мой комментарий</rdf:li>
   </rdf:Alt>
  </dc:description>
  <dc:title>
   <rdf:Alt>
    <rdf:li xml:lang='x-default'>Мой комментарий</rdf:li>
   </rdf:Alt>
  </dc:title>
 </rdf:Description>
</rdf:RDF>

The data can be exported with exiftool like this:

exiftool -xmp -b test.jpg

After uploading to iNaturalist, I see this:

bazwal · April 24, 2025, 4:45pm

PS: one thing I forgot to ask is what your system locale settings are. I don’t use Windows myself, but my understanding is that its default encoding isn’t UTF-8. Have you explicitly changed the settings yourself? The garbled text shown in your image looks very similar to what would be produced by decoding UTF-8 encoded text using the Windows cp1252 encoding. The locale on my Linux system is LANG=en_GB.UTF-8, which perhaps explains why I don’t have the problems you have.

dmka · April 24, 2025, 9:55pm

I attempted to set the description metadata using both Adobe Photoshop’s File Info tool and the standard Windows file property editor, both of which, to my understanding, generate de-facto standard XMP metadata. Locale is en-US. However, neither method was successful.

In production, I use Adobe Lightroom along with Jeffrey Metadata Wrangler plugin, which reliably produces metadata that displays correctly in both Photoshop and Windows. Despite this, the metadata still fail to import properly on iNaturalist.

I’m attaching the test files in a ZIP archive just to ensure they aren’t modified during upload or download: .https://drive.google.com/file/d/1KTDsi28D118K028obzJbZRakaFi_pd_u/view?usp=drive_link

How did I made this samples:

– Here I clean all metadata in the reference image
exiftool.exe -all= -overwrite_original test-noexif.jpg
copy test-noexif.jpg test-photoshop.jpg
copy test-noexif.jpg test-windows.jpg

– Then I edit test-photoshop.jpg with Adobe Photoshop → File Info and Save
exiftool.exe -xmp -w txt -b test-photoshop.jpg

– Then I edit test-windows.jpg with RightClick/Properties and Apply changes
exiftool.exe -xmp -w txt -b test-windows.jpg

dmka · April 24, 2025, 10:05pm

I opened the jpegs in a binary editor and the embedded text appears to be encoded in UTF-16, but definitely not Windows-1252.

bazwal · April 25, 2025, 2:05am

The relevant tags are all encoded as UTF-8, but having now looked at the iNaturalist source code, it seems the raw bytes of some tags are just treated as ASCII text - regardless of the encoding - and all the code does is strip out any null bytes. This approach is strictly correct, since the affected tags are defined by the EXIF standard as only accepting ASCII, and should never contain data that uses any other encoding (including UTF-8).

I had a look at the metadata of your image files, and found that the title/description/subject isn’t only written to the XMP section - there’s also three other tags added in the EXIF section. The one that’s causing the problem is the ImageDescription tag. Once that’s removed, the uploader reads everything else correctly. The two other EXIF tags are XPTitle and XPSubject. These are Windows-specific, but according to the spec they can be encoded using either UTF-8 or ANSI, and have the data type int8u. By contrast, the ImageDescription tag has a data type of string, which only supports ASCII (see: Standard Exif Tags).

So the bug here is that UTF-8 encoded data is being written to a tag which does not really support it. Given that, it’s not unreasonable for the uploader to refuse to guess the encoding and just treat the data as ASCII text. It uses a third-party library to load the exif (piexifjs), so it would be understandable if the current behaviour wasn’t changed. In the meantime, I suppose the ideal work-around from your end would be to configure your image metadata editing software to just never write to the ImageDescription tag (assuming that’s possible).

dmka · May 2, 2025, 8:22am

Thank you bazwal

I tried several methods and was only able to achieve the desired result by adding a command to the workflow that removes the EXIF ImageDescription tag

exiftool -overwrite_original -EXIF:ImageDescription= {FILE}

or, for directories:
exiftool -overwrite_original -EXIF:ImageDescription= {DIR}

However, it would be helpful if the iNaturalist team could ensure that the uploader is compatible with modern commercial software that supports international encodings (such as Unicode and UTF-8) by default. The Windows-1252 hasn’t been around in over a decade.

bazwal · May 2, 2025, 12:10pm

All of that software is buggy. It should never write anything other than ASCII to the ImageDescription tag. Here’s how it’s defined by the EXIF Standard:

ImageDescription
A character string giving the title of the image. It is possible to be added a comment such as “1988 company picnic” or the like. Two-byte character codes cannot be used. When a 2-byte code is necessary, the Exif Private tag UserComment is to be used.
- Tag = 270 (10E.H)
- Type = ASCII
- Count = Any
- Default = None

Many image metadata editors simply ignore the spec and write data using whatever encoding they like. This is probably to avoid breaking other legacy/obsolete software that expects the tag to be there.

I suppose iNaturalist could try to work-around these third-party bugs by attempting to guess the non-ASCII encoding used in the ImageDescription tag. There are various tools available for this task, by none of them are guaranteed to guess correctly - especially when there’s only a small input sample to work with (which is almost always the case with metadata tags). If the iNaturalist devs choose not to go down this path, I wouldn’t blame them.

An alternative way to avoid all these problems is to simply never use “modern” software that writes buggy metadata without giving you the option control the output

dmka · May 2, 2025, 3:46pm

With the release of EXIF 3.0, the standard was updated to allow UTF-8 encoding in certain tags, including ImageDescription. This change reflects the modern need for broader character set support, particularly for non-English languages, and aligns better with Unicode-based systems.

Exif Version 3.0

https://www.cipa.jp/std/documents/download_e.html?DC-008-Translation-2023-E

4.6.5.4.1. ImageDescription

A character string giving the content description of the image. It is possible to add a description of the content or comment such as “1988 company picnic” or the like. 2-byte or larger character codes can not be used when ASCII is set as Type. When a 2-byte or larger code is necessary, UTF-8 shall be set as Type. The count is the value including NULL terminations.

Tag = 270 (10E.H)
Type = ASCII or UTF-8
Count = Any
Default = None

Exif 3.0 released, featuring UTF-8 support
Jun 1, 2023
https://iptc.org/news/exif-3-0-released-featuring-utf-8-support/

So, using UTF-8 in EXIF is not a bug in modern software — it’s a feature

bazwal · May 2, 2025, 9:17pm

Nobody said that using UTF-8 in EXIF per se is a bug - there are many other tags that already explicitly support it. It’s fair to point out that a new standard was recently published, but the fact remains that a lot of software violated the earlier standard by using literally any non-ASCII encoding (not just UTF-8) in the ImageDescription tag. This means there will continue to be many photos around with metadata that is encoded as neither ASCII nor UTF-8 (for example, UTF-16 and many other two-byte encodings). It’s also very likely that there will be a lot of software around that continues to read and write EXIF metadata using the older standards. Just because a new standard becomes available, doesn’t guarantee that everyone will immediately start complying with it.

It wouldn’t be difficult for iNaturalist to add support for UTF-8 only, and avoid guessing any other encodings. Decoding invalid UTF-8 fails quite reliably, so it could simply fall back to ASCII if an error occurs. I still wouldn’t blame the devs if they chose to just leave things as they are, though.

PS: I just checked your example photos, and sure enough, the metadata is written using version 2.31 (i.e. Exif Version = 0231). So your Photoshop software does write buggy metadata, since it doesn’t fully comply with the actual standard it uses.

dmka · May 3, 2025, 8:49am

It might be a good idea to prioritize the use of embedded XMP metadata when available, as it is internationalized by design.

If XMP data is not present, fallback to EXIF metadata, which is more device-specific (maintained by camera manufacturers such as Sony, Nikon, Canon, Ricoh, Fuji, Apple, etc) and often includes legacy.

Personally, I’m not a fan of guessing the encoding, as it introduces unpredictability and potential inconsistencies in metadata interpretation.

bazwal · May 3, 2025, 12:50pm

But we already have unpredictability and inconsistency, so if you want universal UTF-8 support, some form of guessing is inescapable. A large proportion of photos don’t include XMP metadata, so the EXIF tags will have to be interpreted somehow. Attempting to decode as UTF-8 and falling back to ASCII on failure is a simple and effective way of supporting both the current and new EXIF standards. It’s not perfect, but it would be a definite improvement.

Topic		Replies	Views
Android app: Choice of language does not apply List "My Observations" Bug Reports	7	433	September 19, 2022
Strange language behaviour with photo credits Bug Reports	16	1406	October 17, 2021
Rich text encoding change General	2	248	September 8, 2020
Welcome message on iNaturalist is displayed in Spanish when I have chosen English Bug Reports web	2	320	October 6, 2020
Photos Metadata (EXIF) lost with the iNaturalist application Bug Reports android-app , staff-can-replicate , github-issue-made	9	175	January 10, 2025

Non-English EXIF image description is not imported correctly into the Note field

Related topics