Obfuscating message content
Many spam-filtering techniques work by searching for patterns in the headers or bodies of messages. For instance, a user may decide that all e-mail he or she receives with the word "Viagra" in the subject line is spam, and instruct her mail program to automatically delete all such messages. To defeat such filters, the spammer may intentionally misspell commonly-filtered words or insert other characters, as in the following examples:
- V1agra
- Via'gra
- V I A G R A
- Vaigra
- \ /iagra
- Vi@graa
The principle of this method is to leave the word readable to humans (whose pattern-recognition skills make them adept at picking out the true meaning of misspelled words), but not recognizable to a literally-minded computer program. This is effective up to a point. Eventually, filter patterns become generic enough to recognize the word "Viagra" no matter how misspelled -- or else they target the obfuscation methods themselves, such as insertion of punctuation into unusual places in a word.
(Note: Using most common variations, it is possible to spell "Viagra" in over 1,300,000,000,000,000,000,000 ways.)
HTML-based e-mail gives the spammer more tools to obfuscate text. Inserting HTML comments between letters can foil some filters, as can including text made invisible by setting the font color to white on a white background, or shrinking the font size to the smallest fine print.
Another common ploy involves presenting the text as an image, which is either sent along or loaded from a remote server. This can be foiled by not permitting an e-mail-program to load images.
As Bayesian filtering has become popular as a spam-filtering technique, spammers have started using methods to weaken it. To a rough approximation, Bayesian filters rely on word probabilities. If a message contains many words which are only used in spam, and few which are never used in spam, it is likely to be spam. To weaken Bayesian filters, some spammers, alongside the sales pitch, now include lines of irrelevant, random words, in a technique known as Bayesian poisoning. A variant on this tactic may be borrowed from the Usenet abuser known as "Hipcrime" -- to include passages from books taken from Project Gutenberg, or nonsense sentences generated with "dissociated press" algorithms. Randomly generated phrases can create spamoetry (spam poetry) or spam art.
After these nonsense subject lines were recognized as spam, the next trend in spam subjects started: Biblical passages. A program much like Mark V Shaney is fed Bible passages and chops them up into segments. The reasoning is that this text, very different from the writing style of today, will confuse both humans and spam filters.
Another method used to masquerade spam as legitimate messages is the use of autogenerated sender names in the From: field, ranging from realistic ones such as "Jackie F. Bird" to (either by mistake or intentionally) bizarre attention-grabbing names such as "Sloppiest U. Epiglottis" or "Attentively E. Behavioral". Return addresses are also routinely auto-generated
From Wikipedia, the free encyclopedia |