Anti-spam - How it works
This document gives some background on how the Mailtraq intelligent anti-spam system works, and explains why it is so powerful when compared to alternative approaches.
We all recognise a message as spam when we see it, but spammers have been working hard to make it difficult for automated systems to distinguish them from normal messages. There are lots of systems available which can correctly tell whether a message is spam or not most of the time, but getting it right all of the time is very difficult indeed.
There are several reasons why it's so difficult to spot spam:
- Spammers change their techniques to avoid detection all the time.
- Normal mail can often look like spam.
- There are lots of borderline messages which some users consider spam, and other users consider desirable marketing.
One of the most important things is to avoid rejecting a wanted message - so-called 'false positives'. These can be extremely expensive mistakes, as you may for example be rejecting an order from a customer.
Some anti-spam systems work by being told a list of key words and phrases to look for. This worked well in the early days, but spammers quicky learnt to place deliberate typos in their messages to avoid detection, so "Viagra" might become "V1agra" or "Vi agra". Many other techniques are also used by the spammer to avoid simple rule-based detection. Administrators could easily spend more time inventing new rules than they are saving users overall.
A new, more intelligent system was clearly needed. At the heart of the Mailtraq intelligent anti-spam system is a Bayesian classification engine. This doesn't rely on the administrator to feed in key phrases to look for. Instead the administrator simply feeds the system a series of messages, and says whether they are spam, or not spam. The system then works out, statistically, what features of the messages are symptomatic of good and bad messages. Then when then faced with a new message it has never seen before it will make an intelligent decision in the same way a human reader can recognise a message as being spam.
The Bayesian anti-span engine is based on some extremely complex mathematical theorums derived from the work of Thomas Bayes. This pioneering work is now being used to solve cutting edge problems, from spam detection to physics and biology.
Avoiding the risks of rejection
With even the best anti-spam system there is a risk that a wanted message could get classified as spam. The key here is to plan what happens to the spam messages. There are several options:
- The messages can be silently deleted.
- The messages can be filed in a 'spam' folder for later inspection.
- The messages can be bounced back to the sender.
Mailtraq supports all these options, along with some extra features to help legitimate senders.
Silently deleting messages is probably the worst option to use, as neither the recipient nor the sender will know that their message was deleted.
You can move spam messages into a separate folder, so the recipient can periodically check through to see if any wanted messages are in there. This may be appropriate in some cases, but many users will quickly learn to trust the spam system, and not bother looking, so again, neither the sender nor the recipient gets to know that the message wasn't delivered.
Bouncing the spam messages back to the sender is a good option in most cases, as this lets the sender know the message didn't get through. The Mailtraq system has a special trick here: in the rejection message the sender can be given a special bypass email address that they can use to ensure the message gets through. Of course there's a chance spammers will get hold of this bypass address, but that doesn't matter as the bypass address is set to only work for a limited time.
Tutorials: setting up Mailtraq's anti-spam system