Page 1 of 3
Forum spam is a big problem and very difficult to combat. There are some solutions, none perfect, but this one might help reduce the burden on spam in your forum or mailing list.
The big mystery about forum spam is the idea that it is worth generating. The opportunity to place a comment in the forum that lists a few links to usually doubtful websites seems to be worth a lot to some people. Alternatively the cost of placing such advertising might be so small that the fact that it has low or even negative productivity might not matter. Whatever the situation, is it is very difficult to figure out the workings of the forum/email spam market.
What is the payoff?
For example, I Programmer has a mailing list. If you sign up all you get is access to the CodeBin and a weekly email listing the latest news, book reviews and articles. We don't collect demographics or anything useful for advertising so there is very little value in the mailing list. What is more if you sign up you don't get any greater opportunity to make comments or contribute to a forum. In short, signing up to the I Programmer email list should be something that has no value for a spammer. As a result we didn't bother with a Captcha and relied instead on an email validation system. You sign up, the system sends you an email, you respond to it and you are registered to use the CodeBin and get a weekly news email.
The burden of spam
At first everything was fine but slowly the number of email registrations increased to around 50 or more per day. The majority of the sign ups were unbelievable in that the name and user name field generally were identical and something like "xsdferdf". The email addresses were sometimes obviously spam and along the lines of "firstname.lastname@example.org" and so on but they generally all worked.
Occasionally blocks of these signups would be invalidated by Google, or some other ISP, when they noticed that they were bogus. The result was that increasingly frequently the mailing list was swamped with bounced email. Most of the time however the emails addresses continued to work until their mailboxes filled up and went over quota and again they bounced.
Notice that there can be no possible reward for these sign-ups as the user doesn't get any opportunity to post comments with links or even get their email address or user name visible to the public. Whoever is paying for these sign ups certainly isn't getting value for money - but still they keep coming.
If a pointless sign up to a mailing list gets 50 per day you can imaging the problem that you might run into if you are supporting a real forum with the opportunity to make comments and hence get a reward.
So what to do about it?
Captcha to detect bots
Our first response was to put in a Captcha but no one likes the traditional word Captchas that are impossible for humans and reportedly easy for AI. So we used an off-the-beaten-track image-based Captcha which is almost fun to play with. Even so we have received a high proportion of complaints about it not working or being difficult to use.
The good news is that the Captcha made a big difference and reduced the spam signup rate by 90% - but there were still signups that looked like spam. Given that the Captcha was a new system and unlikely to be cracked in the time allowed, a reasonable conclusion is that the remaining spammers are not bots but people and hence Captcha-style systems are not going to be effective.
So our conclusion is that human agents are signing up using working email addresses so making Captcha and email validation less ineffective.
Match the spammers
About the only defense against the flood of spam sign-ups is to use a database of known spam email addresses. There are a few of these, but the best is probably Stop Forum Spam. This site collects spam email addresses and other information provided by their supporters. If you run a forum and can identify spammers, then I would suggest that contributing to the database is a worthwhile thing to do.
You can check an email address and user name in the database using a search facility. There is also an API that you can use to automate the process. However, the service is offered on a free basis and overloading it is not a very friendly thing to do. It is suggested that you keep your API calls down to fewer than 20,000 per day - which seems very generous. If you need to use more then one possibility is to download the database of email addresses and do the lookup locally or set up a mirror site.
Using this API you can filter out probably 50% of potential spammers that sign up to your mailing list/forum. All you have to do is check their details.
So what should you do if you detect a spam email address?
How not to respond to spam
The temptation is to zap them as quickly as possible and delete them from the database. Satisfying, but not a good idea. The reason is that some spammers run the following algorithm:
register email address x
wait a few minutes and try to register
address x again
IF x is rejected as duplicate THEN exit
ELSE register address x again
You can see that deleting the email address too quickly only results in unnecessary traffic. You will also find that the same email addresses are registered again after a few days if you delete them.
The best scheme seems to be to create a batch delete program that you can run on a regular basis or just before doing a mailing or similar list-wide operation.