I recently wrote of some experiments to improved Feed2JS (see the updates fed to the site, bottom of the main page).
Specifically, based on the request from a user in Germany, I attempted to change the output to encode content as UTF-8 using the new features on Magpie RSS 0.7. However, I have gotten an email and a comment from people with apparently French language sites who claim it has broken their french accents and characters.
However, when I preview the feeds in question from our site using the Build a Feed tool, they look okay.
For one comment to the site, I was suspicious since the URL provided had its own encoding set in the HEAD meta tags as iso-8559-1… does that mean French language sites break under UTF-8?? I am really ignorant of this stuff. But if it breaks more sites than it helps, I will have to revert the encoding to what it was before (Magpie does not allow a per feed encoding setting, it is all or nothing).
What’s a character to do?
Update: Until I can sort this out, I am reverting Feed2JS so it uses default iso-8559-1 encoding. Feeds may need an hour to refresh from our cache.
Another Update: Another attempt. A new paraemter utf=8
sent to the script on our server, should fork it to a different Magpie for the UTF encoding (see the examples on the Feed2JS log site)
If you have english text as UTF-8 and display it on a page with ISO-8859-1 or US ASCII, it works fine because UTF-8 is backwards compatible with US ASCII (which in turn is the base for ISO-8859-1).
Unfortunately, finnish, swedish, french, german and many other european languages that use ISO-8859-? have characters that are not compatible with UTF-8.
You have to:
a) Explicitly ask your users to use UTF-8 as the default encoding for their web pages. As feed2js is targetted to less tech-savvy people, this is not a good idea.
b) Implement an option into your feed2js script to encode the content in UTF-8 or convert the UTF-8 output to some target encoding, for example ISO-8859-1. Obviously you can’t easily convert UTF-8 to all encoding formats in existence so you will have to limit your target audience (or sort the problem with a really kick ass UTF-8 to ??? conversion library).