Blog Pile

Oh, those messy character encodings..

I recently wrote of some experiments to improved Feed2JS (see the updates fed to the site, bottom of the main page).

Specifically, based on the request from a user in Germany, I attempted to change the output to encode content as UTF-8 using the new features on Magpie RSS 0.7. However, I have gotten an email and a comment from people with apparently French language sites who claim it has broken their french accents and characters.

However, when I preview the feeds in question from our site using the Build a Feed tool, they look okay.

For one comment to the site, I was suspicious since the URL provided had its own encoding set in the HEAD meta tags as iso-8559-1… does that mean French language sites break under UTF-8?? I am really ignorant of this stuff. But if it breaks more sites than it helps, I will have to revert the encoding to what it was before (Magpie does not allow a per feed encoding setting, it is all or nothing).

What’s a character to do?

Update: Until I can sort this out, I am reverting Feed2JS so it uses default iso-8559-1 encoding. Feeds may need an hour to refresh from our cache.

Another Update: Another attempt. A new paraemter utf=8 sent to the script on our server, should fork it to a different Magpie for the UTF encoding (see the examples on the Feed2JS log site)

If this kind of stuff has value, please support me by tossing a one time PayPal kibble or monthly on Patreon
Become a patron at Patreon!
Profile Picture for CogDog The Blog
An early 90s builder of web stuff and blogging Alan Levine barks at CogDogBlog.com on web storytelling (#ds106 #4life), photography, bending WordPress, and serendipity in the infinite internet river. He thinks it's weird to write about himself in the third person. And he is 100% into the Fediverse (or tells himself so) Tooting as @cogdog@cosocial.ca

Comments

  1. If you have english text as UTF-8 and display it on a page with ISO-8859-1 or US ASCII, it works fine because UTF-8 is backwards compatible with US ASCII (which in turn is the base for ISO-8859-1).

    Unfortunately, finnish, swedish, french, german and many other european languages that use ISO-8859-? have characters that are not compatible with UTF-8.

    You have to:

    a) Explicitly ask your users to use UTF-8 as the default encoding for their web pages. As feed2js is targetted to less tech-savvy people, this is not a good idea.

    b) Implement an option into your feed2js script to encode the content in UTF-8 or convert the UTF-8 output to some target encoding, for example ISO-8859-1. Obviously you can’t easily convert UTF-8 to all encoding formats in existence so you will have to limit your target audience (or sort the problem with a really kick ass UTF-8 to ??? conversion library).

Comments are closed.