I recently wrote of some experiments to improved Feed2JS (see the updates fed to the site, bottom of the main page).

Specifically, based on the request from a user in Germany, I attempted to change the output to encode content as UTF-8 using the new features on Magpie RSS 0.7. However, I have gotten an email and a comment from people with apparently French language sites who claim it has broken their french accents and characters.

However, when I preview the feeds in question from our site using the Build a Feed tool, they look okay.

For one comment to the site, I was suspicious since the URL provided had its own encoding set in the HEAD meta tags as iso-8559-1… does that mean French language sites break under UTF-8?? I am really ignorant of this stuff. But if it breaks more sites than it helps, I will have to revert the encoding to what it was before (Magpie does not allow a per feed encoding setting, it is all or nothing).

What’s a character to do?

Update: Until I can sort this out, I am reverting Feed2JS so it uses default iso-8559-1 encoding. Feeds may need an hour to refresh from our cache.

Another Update: Another attempt. A new paraemter utf=8 sent to the script on our server, should fork it to a different Magpie for the UTF encoding (see the examples on the Feed2JS log site)

The post "Oh, those messy character encodings.." was originally pulled from under moldy cheese at the back of the fridge at CogDogBlog (http://cogdogblog.com/2005/01/oh-those/) on January 13, 2005.

1 Comment

  • If you have english text as UTF-8 and display it on a page with ISO-8859-1 or US ASCII, it works fine because UTF-8 is backwards compatible with US ASCII (which in turn is the base for ISO-8859-1).

    Unfortunately, finnish, swedish, french, german and many other european languages that use ISO-8859-? have characters that are not compatible with UTF-8.

    You have to:

    a) Explicitly ask your users to use UTF-8 as the default encoding for their web pages. As feed2js is targetted to less tech-savvy people, this is not a good idea.

    b) Implement an option into your feed2js script to encode the content in UTF-8 or convert the UTF-8 output to some target encoding, for example ISO-8859-1. Obviously you can’t easily convert UTF-8 to all encoding formats in existence so you will have to limit your target audience (or sort the problem with a really kick ass UTF-8 to ??? conversion library).