Broken links to Wikipedia when × is in the URL (presumably may affect other special characters)

Platform: Website

Browser: Firefox, though doubt browser dependent

URLs:

Description of problem (please provide a set of steps we can use to replicate the issue, and make as many as you need.):

Step 1: Go to the page for a taxon which has × in the name - see list above
Step 2: Click on the link to Wikipedia
Step 3: Observe that the link is incorrect. E.g. it links to https://en.wikipedia.org/wiki/Crocosmia_×_crocosmiiflora instead of https://en.wikipedia.org/wiki/Crocosmia_×_crocosmiiflora

4 Likes

For clarity, it is automatically finding and displaying the wikipedia page on inat’s taxon page just fine, it is only the little arrow with the link to view the wiki page directly that is broken. It looks like the problem is that it is trying to use the actual cross character directly in the url, when maybe it needs the cross character with a regular ‘x’ to match the formatting of the wikipedia article urls. I guess the developers must already know how to get the right link because the embedding works.

Browser : Firefox, though doubt browser dependent

can confirm this is happening for me on chrome as well

Likely the special character is getting altered by some sanitization function in iNaturalist. It may need to be URL-encoded before being passed to the sanitizer. (Alternatively, it could be getting double-encoded.)

Wikipedia do use the actual “×” character in the URLs - but they also usually have redirects from the version with the letter “x” for easy searching.

The following works as well:

https://en.wikipedia.org/wiki/Crocosmia_%C3%97_crocosmiiflora

Made a Github issue for this: https://github.com/inaturalist/inaturalist/issues/3813

Note that the character × does not actually appear in the correct URL since that is not an allowed character in URLS. It is just decoded when displayed in the post and also in your browser’s address bar. If you right click the link in my post and select “Copy link” or visit the page and copy the URL from the address bar and paste elsewhere, you will see that it the URL actually ends with Crocosmia_%C3%97_crocosmiiflora (%C3%97 being the URL encoded version of ×.

Similarly the actual problematic URL used by inat ends with Crocosmia_%C3%83%C2%97_crocosmiiflora - (%C3%83%C2%97 being the URL encoding for “Ã\u0097” where \u0097 is this invisible character: https://www.fileformat.info/info/unicode/char/0097/index.htm

I would guess that it is a character encoding issue in the database but for the fact that the wikipedia content is displayed correctly on inat.

1 Like

This should be fixed.

This topic was automatically closed after 18 hours. New replies are no longer allowed.