|
No matter how you slice it - the spam problem is getting worse. In 2004, it was sufficient to use simple scoring mechanisms to determine whether email was spam or not because it was primarily text-based. Techniques such as Heuristics (weighted scoring), Bayesian filters (probability analysis), and reputation lists (RBL’s) were widely adopted and incorporated in solutions from leading vendors at the time. They also became the core techniques for open source solutions like Spam Assassin, which spammers have full access to. More recently, the sheer quantities of spam have increased by over 100%, and most of that growth is attributed to an increase in more sophisticated methods -like image-based spam. While image-based spam has been around for years, it became much more sophisticated in 2006. Image-based spam messages initially consisted of images only, with no text, URL hyperlinks or other identifying characteristics. Because it has no text included with the message, it rendered text-centric anti-spam technology virtually useless. Making matters worse, spammers often surrounded the image with random “innocent” text so that the message could not be blocked based on a simple “imageonly” filter rule. The extra text was also used to corrupt or confuse Bayesian filters. Most spam fighters, including the open-source community, responded with two basic methods for blocking image-based spam: fingerprinting and OCR (Optical Character Recognition). Fingerprinting identifies a specific graphic through a set of characteristics such as an MD5 checksum. However, the counter-measure to this technique was quite simple. By modifying a few pixels in the graphic (Figure 1), the fingerprint can be easily changed. By randomizing the “noise” in the image, each image fingerprint is unique and the simple fingerprint filter that is coded becomes severely compromised and ineffective against the spam. Another quick counter-measure to image fingerprinting used by spammers is to break the single image into multiple images pieced together to appear as one. This technique is effective because spammers send out the same baseline image but slice it can randomly create unique messages with variable number of jigsaw puzzle pieces of varying size. The second method of blocking image-based spam is OCR. OCR attempts to convert the text within the image to characters and then filter them using the traditional Bayesian and Heuristics methodologies. OCR works well under stable conditions like traditional black text on a white background, but it’s easy to make an OCR algorithm confused by adding variability into the image. As illustrated in Figure 2, background colors, patterns, font size, font color, text layout and text super/subscripting are all used to randomize the images and cause the OCR algorithm to fail. If the OCR algorithm doesn’t find any recognizable text, the traditional scoring filters are unable to block the spam. At the end of 2006, spammers adopted spam techniques that significantly crippled fingerprinting, OCR and earlier scoring-based technologies. In addition to the almost 100% randomization of the images as described above, spammers also adopted animation techniques that hide the image’s call to action. One animated GIF technique places the “money image” within a series of frames. Each frame is randomized, and the number, animation timing, and sequence of frames is randomized within the series. Figure 3 illustrates a typical animation sequence. This technique is effective because many simple filter technologies only examine the first image in the animation sequence. Also, the animation sequence is variable, making it difficult to determine which frame contains the call to action or spam. Another animation technique involves slicing the image into layers. When animated, these slides appear to the user as a single flat image. Again, the base image is randomized, and the slices are unique to each message, often slicing through lines of text and making it impossible to analyze using OCR. New obfuscation methods are implemented relentlessly and in real-time, and early generation technologies are not able to keep up with the shape-shifting nature of the attacks. Spammers do not rely on any one technique for long, so spam fighting research continues to develop new techniques to identify spam message types that are on the horizon. In 2007, Red Condor anticipates new iterations to continue to grow.
|