Yes, fighting blog spam has been a huge distraction. I would rather be creating things than roach stomping. But I refuse to close off comments completely; it runs dead against what blogs should do to foster community building.
About 36 hours ago, I took the approach of renaming my mt-comments.cgi script. The new name was discovered by spam bots in less than 24 hours, so I doubt they are scarfing it from Google (regardless, for now, I am excluding all robots in the server robots.txt file, though I doubt the majority of bots even pay attention to that anymore).
The problem in my previous approach of funneling all comments to the individual entry form is that the HTML source clearly reveals the URL for the script to generate spam. I had done it completely backwards by removing links to the pop up comment form. So the goal is to remove the full URL from all pages for http://………/cgi-bin/mt-comments.cgi?entry_id=X which is the direct addresses the spammers need as well as removing the comment form from the bottom of the individual entry.
I’ve altered the links that generate the comment window and the JavaScript function that handles it to munge things so the full URL for the comment script call is broken up into hopefully unrecognizable chunks. I’d be more detailed, but I’ll make the spammers work for their crumbs.
Also, long overdue, I updated the MT software to version 2.661, which introduces some code that prevents multiple script calls in short spans of time, or what is known as “throttling”. And Yes, D’Arcy, I have my MT-Blacklist up to like 1100 entries (likely slowing the comment posting way down), but I find it only catches a handful of regular attempts- the 3 waves I got this week were all not on the blacklist (but they are now).
The War goes On.
One word: TypeKey (http://www.typekey.com).
TypeKey is a weblog comment registration system from Six Apart, compatible with Movable Type and TypePad.
The recently relaunched Blogger also has the option to require registration (with Blogger.com) before a comment can be submitted.
Open comments will never be perfectlysafe from spam; you’ll always be in a blacklist race with the spammers. This is explained in beautiful detail here: http://diveintomark.org/archives/2003/11/15/more-spam
In the long run, you will have to choose between the inconvenience of having spam or the incovenience of having your readers authenticate themselves before they comment. Centralized services like TypeKey make authentication less painful, at least compared to having a different log-in for every blog.
Fine, but TypeKey is still vapor and appears it will demand MT 3, which is due out in “late April”. Looking forward and dreading what it might take to migrate to 3.0, maybe it is not a big deal.
In the long run, Greg, my amounts of spam are rather tiny, compared to stories of “crapflooding” and DOS level stunts. Nothing on the scale that Mark Pilgrim attracts, his description of the mod-perl approaches ro defenses made me dizzy. If I can keep ’em at bay with some trivial approaches, I will be content.
So th ejury might be out on what will happen to comments. I cannot recall getting ever more than a handful of legit comments, and I hear many more people incidently saying they read the CDB than actually bother to post something here.
It is only a matter of weekes I bet until Trackback startes geeting equal abuse.
Wow, when you say that only humans may contact, I suppose you meant only humans using a certain windows-based browser or those who are savvy enough to view the source and construct a URL to your dynamic comments page.
My goodness that was circuitous. Anyhow, I’ve already sent you an email about my long trail towards commenting on your site…
I wanted to say two things:
“And Yes, D’Arcy, I have my MT-Blacklist up to like 1100 entries (likely slowing the comment posting way down)”
1100 entries should NOT slow down comment posting appreciably. I’m not sure if you’re having a particular problem that you’re attributing to MT-Blacklist or if you just read some bad information. In any case, it doesn’t.
However, if you are concerned about your blackilst being too large, you can always use the regular expression feature of MT-Blacklist to cut out a wide swath of sites you don’t want posted. I’ve done this on my own blacklist which you can see at http://www.jayallen.org/blacklist.
And a question for Greg: Have you ever USED MT-Blacklist? Just wondering, because if you had, you probably wouldn’t be latching onto Mark’s six-month old rant that he posted when MT-Blacklist first came out. Mark has a thing against blacklists, and for the most part, I completely agree with him, but MT-Blacklist is not only different, but it has proven its worth on thousands of websites around the internet. I’d say that explains the issue far more beautifully than a rant from a well-intentioned albeit, in my opinion, slightly misguided techie who did not have the knowledge of MT-Blacklist’s effectiveness today.
In any case, if you actually HAVE tried MT-Blacklist, I would love to know what problems you found with it.
Jay Allen emailed (discovering a bug in my button script) and noted that 1100 entries would NOT slow down MT due to the MT-Blacklist plugin, and if one is concerned about the size of the Blacklist file, to get a hold of some regex (regular expressions) to cut down similar addresses).
Also he wrote “one of the comments uses Mark Pilgrim’s rant back in October as gospel to why blacklists don’t work. My best argument to that is “and yet they have and are”… Mark’s rant is fairly off base when it comes to MT-Blacklist. Email or other types of blacklists (especially IP based), sure. But this one is different.”
The good dog award goes to Jay for persisting through my broken button, my messed up feedback form, a missing email address in my RSS feed…. I am munching humble pie for breakfast.
Alan: “Fine, but TypeKey is still vapor and appears it will demand MT 3.”
True enough. So it solves your problem in a month or two instead of today. 🙂 However, I think the bottom line still holds true — bloggers & commenters will be inconvenienced in some fashion. You can’t have open, unauthenticated commenting AND have rock-solid anti-spam measures. You either require authentication or deal with some spam around the edges.
Jay: “Have you ever USED MT-Blacklist? Just wondering, because if you had, you probably wouldn’t be latching onto Mark’s six-month old rant.”
Sure, I use MT-Blacklist on my own weblog (http://www.tenreasonswhy.com/weblog/). Works fairly well, but then I don’t seem to have the spam problem Alan has (or if I do I’m not as frustrated by it).
But, I do agree with Mark and not you, Jay. At least for email, blacklists have not proven to be an effective, scalable anti-spam solution. It’s *part* of a solution, but it’s not the solution by itself. I suppose time will tell whether the scalability (or lack thereof) holds true for comment spam.
One solution might be what Google/Blogger has implemented. Comments on Blogger are redirected through a URL that doesn’t pass on PageRank (see http://help.blogger.com/bin/answer.py?answer=808). Remove PageRank and you remove at least one primary motivation for comment spammers. It would be cool if Google opened that PageRank-free redirect non-Blogger weblogs to use.
By the way, Jay, you can apparently bypass Alan’s button issue by just clicking on the comments link on the main page, instead of on the Post a Comment button on the archive page. The comment pop-up doesn’t have all the hoops to jump through. 🙂
At least for email, blacklists have not proven to be an effective, scalable anti-spam solution.
I COMPLETELY agree. But need I outline the differences between email spam and comment spam which are crucial to understanding the differences between email spam blacklists (like MAPS RBL) and comment spam blacklists?
Please don’t make me. I’ve done it about a billion times all over the net… Think by whom and where the blacklists are implemented, the content of the blacklists and the aims of the spammers.
Different beasts.