Editore"s Note
Tilting at Windmills

Email Newsletter icon, E-mail Newsletter icon, Email List icon, E-mail List icon Sign up for Free News & Updates

March 7, 2006
By: Kevin Drum

COMMENT SPAM....I'm curious about something. Dan Drezner writes that he's upgraded to a new version of Movable Type that required him to get rid of MT-Blacklist, and as a result he's now deluged with comment spam. But it strikes me that an easy way to eliminate spam would be a plugin that prevents posting of any comment that contains a URL. If you want to be a little less draconian, perhaps the plugin would allow a maximum of one URL per post. Since comment spam all contains multiple URLs, this would get rid of the spam.

Has anyone written a plugin like this? Or am I missing something and this wouldn't work?

Kevin Drum 11:10 AM Permalink | Trackbacks | Comments (49)

Bookmark and Share
 
Comments


I hate George Bush. It is all his fault.

Posted by: yowza on March 7, 2006 at 11:16 AM | PERMALINK

Not allowing URLs sort of defeats the purpose of the Internet, so maybe that ain't such a great solution?

The "type this code from a graphic" system seems to eliminate about 95% of comment spam, so personally I don't know why more don't use that.

Cranky

Posted by: Cranky Observer on March 7, 2006 at 11:17 AM | PERMALINK

Well, unless it was coupled with one of the systems that require some sort of human entry (like the ones that require you to type in a number imbedded in a picture file), such a plugin would likely result in 100 1-URL posts instead of 1 100-URL post.

Posted by: MJ Memphis on March 7, 2006 at 11:18 AM | PERMALINK

I like registration and real names. That eliminates spam.

And trolls.

Posted by: Jeffrey Davis on March 7, 2006 at 11:20 AM | PERMALINK

If you disallow URLs, it greatly decreases the usefulness of comments. If you allow one URL, then the comment spammers will hit anyway. Hell, some of the comment spammers are stupid and annoying enough that they'd hit event without URLs.

Akismet works pretty well for me (though I've had a few spam comments slip by recently), but it may still be only for WordPress.

Posted by: KCinDC on March 7, 2006 at 11:20 AM | PERMALINK

Comment spam all contains multiple URLs?

Really?

And how long would it take the spammers to switch to single URLs?

Hey, why not require every American to carry a national I.D. card? That would stop terrorism in its tracks.

Honestly, Kevin, you need to drop the % of posts made without thinking to something below the current 40%.

Posted by: Ottnott on March 7, 2006 at 11:21 AM | PERMALINK

Some of the best comments have mutliple URLs. How about one of those 'type the phrase you see in the picture' things when a comment contains URLs. Stops spammers, but only bothers non-spammers when they want to post one or more URLs.

Posted by: exgop on March 7, 2006 at 11:26 AM | PERMALINK

Why would you give Kevin a hard time for wanting to provide a certain kind of discussion space? Kevin, don't give up trying to improve.

Posted by: jf on March 7, 2006 at 11:30 AM | PERMALINK

MJ Memphis hit the nail on the head -- MT blocks comments with too many URLs, but I don't want to block comments that contain a relevant URL, so I allow one. The result? Massive amounts of one-url comment spam.

Posted by: Dan Drezner on March 7, 2006 at 11:40 AM | PERMALINK

As others have mentioned, in this thread and near-infinitely many times before, registration and/or "enter text from graphics" methods to guarantee a live human are the best methods of avoiding comment spam; though a blacklist isn't bad, so of course one should carefully consider an "upgrade" that forces you to lose the latter without providing one of the former (losing critical functionality isn't an "upgrade", no matter what the version number says.)

Banning multiple URLs (or, presumably, <A> tags, as text URLs probably aren't what you are talking about) will probably change the kind of spam you get. Banning hyperlinks entirely will still leave you with spam text URLs (you can see this lots of places that prohibit links or links to certain sites). If you are going to ban content to get rid of spam, it will only work if you just go whole hog and ban comments entirely.

Posted by: cmdicely on March 7, 2006 at 11:44 AM | PERMALINK

Hah! You're not fooling anyone. This anti-URL idea is obviously aimed at me. A conspiracy, I tell you!

Seriously, a lot of comment places need registration and/or typing a series of letters to comment. I don't mind any of that, except for one or two places that don't give you a choice about your real e-mail address showing up in the post. Don't mind using my real address for registration.

Makes it a little less convenient to post, but the bright side is that less posting convenience might cut down a bit on some of the monkey feces-flinging parties around here.

Posted by: tbrosz on March 7, 2006 at 11:45 AM | PERMALINK

I like registration and real names. That eliminates spam. And trolls. Posted by: Jeffrey Davis

I agree.

Posted by: Jeff II on March 7, 2006 at 11:47 AM | PERMALINK
Why would you give Kevin a hard time for wanting to provide a certain kind of discussion space?

I think people are giving Kevin a hard time for asking a question (with a proposed solution) that has been discussed to death, rather frequently, on the comments threads (sometimes solicited, sometimes in response to spam explosions in the same thread), where he's been told over and over the best solution, which doesn't involve regulating the content of posts, but he's still suggesting a brain-dead, ineffective, approach of content regulation that is guaranteed not to work.

Posted by: cmdicely on March 7, 2006 at 11:47 AM | PERMALINK

Cranky's right. the number thing is the best. I use James Seng's wonderful SCode MT plugin, which does exactly that:

http://james.seng.cc/archives/000145.html

It works great, but it's a bit technical to install. You have to dig pretty deep into the MT code to do it. You can see it in action here:

http://www.brunoandtheprofessor.com/2006/03/episode_235.php

-Frank

Posted by: Frankbruno on March 7, 2006 at 11:49 AM | PERMALINK

Hah! You're not fooling anyone. This anti-URL idea is obviously aimed at me. A conspiracy, I tell you!

It's only a conspiracy if we don't tell you about it. Now, it's just a policy debate...

Posted by: craigie on March 7, 2006 at 11:52 AM | PERMALINK

I like registration and real names. That eliminates spam. And trolls. Posted by: Jeffrey Davis

I agree.
Posted by: Jeff II

Me too.

Posted by: Ace Franze on March 7, 2006 at 11:53 AM | PERMALINK

Gosh, I thought adding the nofollow tag was going to get rid of spam. At least, that's what we were told, and that's the rationale that non-technical people who use it provide.

In fact, Drezner added that tag a few months ago. Shouldn't his spam problems have been solved? I guess we were lied to about the efficacy - and perhaps real reason - for this tag, eh?

You can see my comment on it here:

danieldrezner.com/archives/002420.html

(Warning: that comment contains links, so watch out. Some of those links link to other pages on that WWW thing, so click with caution.)

Posted by: TLB on March 7, 2006 at 11:56 AM | PERMALINK

Registration and login would undoubtably help.

Posted by: Dr. Morpheus on March 7, 2006 at 11:58 AM | PERMALINK

graigie:

It's only a conspiracy if we don't tell you about it. Now, it's just a policy debate...

Yeah, I guess "conspiracy" isn't the right word. It's more like "targeted legislation." Kind of like those punitive regulations that states pass that only apply to retailers with more than ten thousand employees and whose names start with a "W." ;)

Posted by: tbrosz on March 7, 2006 at 11:59 AM | PERMALINK

The old antispam plugin for MovableType is now a standard feature, and the old plugin's author is now an employee of 6Apart. You can configure MT SpamLookup to scan for URLs in comments and hold them for approval. The default configuration is to hold any comment with 3 URLs or more, but you can easily set it down to 1 URL.

Here's a useful tip on MT spamblocking, from the developer.

http://jayallen.org/journey/2006/03/comment_spam_tips_and_new_tool

Posted by: charlie don't surf on March 7, 2006 at 12:00 PM | PERMALINK

Here's my idea.

If a comment contains a Chinese character, delete the hell out of it.

Posted by: frankly0 on March 7, 2006 at 12:02 PM | PERMALINK


KEVIN DRUM: . . am I missing something . . ?

What you're missing is the spam itself. Since I've never seen you remove any comments containing spam, I assume that, as is the case with most comments beyond the first few, you simply don't see them. Therefore, your knowledge of them was likely obtained via complaints in email -- which become, for you, a form of spam themselves.

But I think that secondhand exposure may be causing you to overestimate the problem. Surely, it's annoying when one must scroll through about a dozen screens to get to the next comment; but it's really quite minor and the frequency of such occurrence seems low to me.

You could implement a code entry posting system as suggested by others, but I think that would be at least as annoying as intermittent spam. As for disallowing URLs, such a limitation would seriously degrade the authenticity and verifiability of many comments or put them on a par with those of trolls who specialize in false assertions, where their URLs either contradict their points or expose their bias by revelation of source.

Unless the problem becomes significantly greater, perhaps the best solution involves your setting up an email filter that would screen out the complaints you receive about spam. Meanwhile, we commenters will do just fine carrying on with our filter: the scroll bar.


Posted by: jayarbee on March 7, 2006 at 12:04 PM | PERMALINK

I may be outnumbered, but I don't favor registration and login, and they would almost certainly prevent me from posting.

You will of course decide on your own whether that is a good or a bad thing.

Posted by: S Ra on March 7, 2006 at 12:05 PM | PERMALINK

The "type this code from a graphic" system seems to eliminate about 95% of comment spam, so personally I don't know why more don't use that.

I HATE doing that. Half the time, I can't read the code out of the graphic. And I detest typing random characters, as do my fingers.

Posted by: frankly0 on March 7, 2006 at 12:06 PM | PERMALINK

Funny. I just got hit a few minutes ago by three spam that each had one URL. Most spam have more than one, but some have one and even some have no URLs at all--they seem to be spams sent to test the system or something.

I'd also be interesting in hearing exactly what type of spam he's getting deluged with. In the past week, there's been a flood of spam (dozens each day) hitting my own blog, of a fairly unusual type: the spams contains 3-4 URLs each, but all the URLs are non-spam addresses, usually news, movie, or educational sites. These URLs don't trigger the usual spam filters because they're not spam sites. Furthermore, these spam each have different (fake) IP addresses and different (fake) email addresses, making it impossible to filter them via a blacklist. Spams like this come in one- to two-week floods every four or five months, in my experience. Lord knows why spammers send them--maybe to probe sites to see how their defenses hold up, or maybe just to annoy people. Spammers are scum and would do anything for any reason, so who knows.

But if this is the spam your friend is getting, maybe it's not his new MT version which is the problem, but rather just the temporary type of spam getting through.

If not, then I am surprised that MT version 3.2 lets spam deluge back in--MT-Blacklist's author, Jay Allen, now works for MT and claims that 3.2 with the SpamLookup plugin activated will block spam better than it did with MT-Blacklist. Does your friend have SpamLookup activated? In this current flood, I've reduced the allowable URLs to one per post, and will keep it that way till the flood subsides--when I check the spam logs and see they have stopped, I'll be raising the limit to three again.

If SpamLookup is active in your friend's blog, and the spam are not the unusual type that get past any filters, then I'd simply recommend reverting back to 3.1x, if possible, and wait for a newer version to come out with better spam protection.

Posted by: Luis on March 7, 2006 at 12:08 PM | PERMALINK

frankly0 >"...If a comment contains a Chinese character, delete the hell out of it."

That should be "contains a non-Roman character"

works for my Postfix server

"Let everyone sweep in front of his own door, and the whole world will be clean." - Johann Wolfgang von Goethe

Posted by: daCascadian on March 7, 2006 at 12:11 PM | PERMALINK

Having to give up MT-Blacklist is one of the main reasons why I have not migrated my blog up to MT 3.x.

Call me old-fashioned, but I don't like forcing the user to take lots of steps to post comments. I am annoyed by required-registration blogs and by having to type those damn strings of letters in order to post. I understand why people need to do it, but I still don't like it.

Posted by: fiat lux on March 7, 2006 at 12:13 PM | PERMALINK

WordPress has built-in features which catch 90%+ of comment spam, and make deleting them very simple. You add a bunch of common spam words and it snags them before they show up on your site. One reason I will never use MT again.

Posted by: DCB on March 7, 2006 at 12:14 PM | PERMALINK

Will my upgrading of my Movable Type install and the upgrading of all my WordPress installs (I only use MT for my personal blog) I have been able to remove my comment moderation. Before that, I had a backlog of 3,700 comments "to moderate" the majority of which were Comment Spam. And I was deluged with Trackback Spam, too.

Now, I have removed moderation from all my personal blog installs and I am so happy.

Go to plugins -- Nofollow Version 2.0, SpamLookup - Lookups Version 2.0, SpamLookup - Link Version 2.0, and SpamLookup - Keyword Filter Version 2.0, were all built into my MT install!

And they work like a charm!

Posted by: Chris Abraham on March 7, 2006 at 12:14 PM | PERMALINK

A truly effective program to minimize spam would have to be tailored specifically to this site, and would need to automatically delete the following phrases:

[CLICK THE LINK! ALWAYS CLICK THE LINK!]
[What you lefties don't get is]
[Good luck explaining that to voters in November]
[lefty moonbats]
[Clinton did it too]
[McA]
[Kyoto is garbage]*

-- and any posts in all caps.

*The phrase "kyoto is garbage" appears spams the threads a lot more frequently than the Chinese characters, which at least have redeeming value in that they are kinda pretty to look at.

Posted by: trex on March 7, 2006 at 12:14 PM | PERMALINK

If you use a coded entry (or quiz question) please make the message change periodically (i.e. stretch, bigtime, pablo, etc.). You could even make Al happy by asking "who wrote the article I'm linking to?".

BTW Steve Benen, the color of an orange depends on it's ripeness.

Posted by: B on March 7, 2006 at 12:17 PM | PERMALINK

I don't like the "nofollow" idea, because (unless I completely misunderstand it) it is based upon the idea that if you tell Google not to value your comment links, spammers will stop trying to bomb your site with spam. This reasoning doesn't work. Referral spam, for example, is based upon the idea that bloggers have automated top-referrer lists or public stats pages. Most bloggers have axed these features, but referral spam is booming rather than declining. Same with the nofollow feature: even if most bloggers enact it, enough bloggers will not use it to make comment spam still profitable for spammers. And since it is dirt cheap for spammers to litter endless comment spam, they don't care if the returns are minimal.

I use MT 3.17, with MT-Blacklist 2.04b and SpamLookup, and though I have to moderate maybe 2-3 spam per day on average, most (usually hundreds per day) get stopped by the filters. I wish I had a better way to filter referral spam, but hey, it's all manageable as it is. No problem--and my visitors don't have to register or captcha or any crap like that.

Posted by: Luis on March 7, 2006 at 12:25 PM | PERMALINK

I'd like to suggest an intermediate solution to registration that I've seen on a few sites, especially wikis, and seems to be very effective: an open password.

When you try to leave a comment, the site asks for a password, or login and password. The proper response is prominently displayed somewhere on the site, perhaps even in the password message itself.

An example:
"To leave a comment on washingtonmonthly.com, you must first enter today's password, `calpundit': __________"

This allows anyone to comment, but will stop a robot.

Posted by: Boronx on March 7, 2006 at 12:34 PM | PERMALINK

Melanie at node707.com (Just a Bump in the Beltway) requires that the commenter preview each comment. It doesn't entirely get rid of comment spam, but on the whole it's pretty effective.

Posted by: RT on March 7, 2006 at 12:37 PM | PERMALINK

Black listing is a much better way to go. heuristic filtering is always beaten by the determined human. Black lists involve the inpust of a large collection of people. Group mind beats single mind. Single mind beats algorithm/heuristic.

Solution: Get the blacklist component working in the latest MT. Pay someone to do it. Less head aches all around.

Posted by: patience on March 7, 2006 at 12:42 PM | PERMALINK

Kevin, such a plug-in to prevent more than 1 or 2 URLs per comment would be easy to write, and would also make it harder to abuse because the offending spam would have to wait a minute between each post before coming back to do another 1 or 2 URLs, and if it has to keep coming back, then it's a lot easier to track it.

This would be a very good idea, but you'd want to put the limit of URLs at around 3 perhaps, so that people could still make substantive and supported posts.

Posted by: Jimm on March 7, 2006 at 1:27 PM | PERMALINK

There is one major problem with Captchas (the little number/letter boxes full of random gibberish [in the form of a graphic] that the poster must enter to post): they are a slap in the face to disabled users. It' essentially disallows the visually impaired from posting.

For instance, I know my grandpa could not post on Digg (assuming he'd want to) because they're Captcha is so damn annoying that his eyesight would not let him figure out what it said. And he's only got mildly bad eyesight due to aging.

As far as taking out links: let's not throw out the baby with the bath water. Links are the world wide web.

Posted by: teece on March 7, 2006 at 1:29 PM | PERMALINK

Jimm:

SpamLookup for MT already does that. You can set the configuration to block comments with greater than n URLs in it. See:

http://bradchoate.com/projects/spamlookup/wiki/SpamIdentification

Posted by: Luis on March 7, 2006 at 1:30 PM | PERMALINK
There is one major problem with Captchas (the little number/letter boxes full of random gibberish [in the form of a graphic] that the poster must enter to post): they are a slap in the face to disabled users. It' essentially disallows the visually impaired from posting.

Good ones, like the one Microsoft (e.g.) uses, have a button you can press to have the text read; I suppose if you are both visually and hearing impaired it could still be a problem. Then again, and you can cycle through them to get one you can read, if its jsut particular distortion features that make it hard.

In the long-term, such techniques are probably doomed, and a verifiable ID, perhaps anonymized through a trusted intermediary, is the only thing that will likely really work.

Posted by: cmdicely on March 7, 2006 at 1:35 PM | PERMALINK

Jeffrey Davis: I like registration and real names. That eliminates spam. And trolls.

And it would eliminate me.

And it would have eliminated the authors of the Federalist Papers.

Don't be so anti-American.

Anonymous political comment has a long and distinguished history.


Posted by: Advocate for God on March 7, 2006 at 3:16 PM | PERMALINK

Make comment-spamming a capital offense punishable by public stoning. I doubt that it would really cut down on it that much since so much of it originates overseas, but I sure would enjoy watching the ones that get caught first. I'd take a week's vacation just to go hunting for the perfect rocks.

Does that make me a bad person?

Posted by: apostropher on March 7, 2006 at 4:03 PM | PERMALINK

I like the graphic words or numerals you have to type in (although I'm not disabled). I strongly dislike having to register or get a typekey identity or whatever.

But the "password of the day" idea seems ideal to me. Anyone can look and plug it in but a robot won't. Except I would want a duck to come down and give me $50 if I typed it accidentally in my comment (thanks, Groucho). Or maybe a pony.

Posted by: David in NY on March 7, 2006 at 4:43 PM | PERMALINK

And it would have eliminated the authors of the Federalist Papers.

No, it wouldn't. Because they would have been published as a (group) blog, not on some other blogs comment thread.

Posted by: cmdicely on March 7, 2006 at 4:46 PM | PERMALINK

Anonymous political comment has a long and distinguished history.

Sez who?


Posted by: Jeffrey Davis on March 7, 2006 at 4:54 PM | PERMALINK

kevin,

Do you dislike it when people post links in their comments?

Posted by: cld on March 7, 2006 at 5:07 PM | PERMALINK

tbrosz wrote: Makes it a little less convenient to post, but the bright side is that less posting convenience might cut down a bit on some of the monkey feces-flinging parties around here.

What can we do about all these black kettles hanging around?

Posted by: Hamilton Lovecraft on March 7, 2006 at 6:03 PM | PERMALINK

H. Lovecraft That's "potty" talk.

Posted by: opit on March 7, 2006 at 10:56 PM | PERMALINK

First of all, apologies for posting from an often political but obviously not-work-safe address. I did try posting anonymously the first time.

---

Spam filtering and comment moderation in the latest Movable Type (ver. 3.2) is great!

The old MT-Blacklist plug-in is now build in. There's a lot of other cool comment moderation stuff in there as well.

Using mostly out-of-the-box settings I rarely have to manually junk more than one bogus comment a day. Every now and then I get little spates of spam when the spammers figure out a new trick but MT seems to use a pooled black list so new exploits get caught pretty quickly.

Another nifty trick is it leaves questionable comments in your list but marks them as "unpublished" till you approve them. I get maybe one of those fence-sitters a week. It also automatically dumps obvious spam into a junk comments area and I can get fifty to one hundred of those a day. So far I've had only one false positive.

Oh yeah, finally, about registration it keeps track of some combination of commenter's email/url/IP/cookie information and does forward weighting with it so if you haven't deleted a commenter before it's more forgiving of the commenter in the future. That seems like a nice compromise between forced registration and leaving everything wide open.

I began this comment planning to clarify the new features and it's sort of turned into an endorsement of the product. Apologies to those who prefer other tools.

figleaf

Posted by: figleaf on March 8, 2006 at 4:40 PM | PERMALINK




 

 

Read Jonathan Rowe remembrance and articles
Email Newsletter icon, E-mail Newsletter icon, Email List icon, E-mail List icon Sign up for Free News & Updates

Advertise in WM



buy from Amazon and
support the Monthly