Good News: Google News and custom searches are available as RSS/Atom Feeds.
Bad News: Has anyone at Google actually googled the RSS 2.0 formats? They have taken a weird approach to the format, double listing the title and publication date items inside the description! Okay, technically it meets RSS 2.0 rules, but functionally, it is doing things differently from feeds than we expect.
Let’s say I am keep tabs on news about squirrels, I get these results. Good enough. But if you look at the actual RSS 2.0 feed content, you see:
Greedy squirrel trapped by nuts - CBBC newsround (audio)
<pubDate>Fri, 12 Aug 2005 15:08:00 GMT</pubDate>
<br><table border=0 width= valign=top cellpadding=2 cellspacing=7>
newsid_4146200/4146228.stm">Greedy <b>squirrel</b> trapped
by nuts</a><br><font size=-1><font color=#6f6f6f>CBBC newsround
(audio), UK -</font> <nobr>
10 hours ago</nobr></font><br><font size=-1>A <b>squirrel</b
after scoffing too many nuts. The bushy-tailed thief had
managed <b>...</b> </font><br>
It makes no XMl sense to me that information stored in one element (the <title>) is repeated inside the <description>!
And what is with that clunky non-web-standard HTML inside the description? They are syndicating formatted content not content. I only got window of this when someone using Feed2JS emailed and asked why Google News RSS feeds are rendered via our site with the title written twice… If they want to render it, apply some XLST to the feed, but do not stuff crufty old HTML inside the feed.
Something is squirrelly at Google, indeed.
And while I am at the ranting, why did Google search switch the output of search results so that matching web site titles now are linked not to the site itself, but run through a Google script that redirects. So if I run a web search on squirrel, the top lik is for the ever popular scary squirrel world, and a mouse hover on the link suggests the link goes to http://www.scarysqurrel.org/ — but if you view source, or try to control-right click to copy that URL, what you get is:
which is in no way suggested via the output
nor is really the URL I want associated with my link to the home of scary squirrels.
So I am guessing they are gathering yet more data on people’s tracks away from Google.
It’s a pain because when I blog and build web sites, I rely heavily on Google to find the correct hyperlinks for people, places and things, so I can provide hyperlinked references in my writing… but now, the links do not copy easily– I have to either follow the redirect links, or reach in and copy the green URL text (and remember to ad an “http://” in front. Maybe this sounds picky, but someone is tinkering with the output, and maiking my web work less efficient.