|
Proofpoint’s MLX-based solutions provide the most effective spam detection available today: o Accurate: Proofpoint’s machine learning technology, based on techniques such as logistic regression, provides the foundation for a powerful, adaptive anti-spam solution capable of analyzing over 20 layers and more than 200,000 attributes to accurately differentiate between spam and valid messages. o Decisive: Traditional anti-spam solutions evaluate a limited number of attributes and are unable to decisively classify spam, which leads to a low rate of effectiveness and a high rate of false positives. MLX ensures that Proofpoint’s solutions will remain effective against the tactics spammers try to employ tomorrow: o Predictive: Continuously-evolving spamming techniques can only be countered by a predictive solution capable of learning and self-adjusting. Traditional reactive approaches just can’t keep pace. o Adaptive: Proofpoint’s MLX-based solutions automatically adapt to counter new threats. As more data from both valid email and spam is added to the machine learning model, the system identifies and weights relevant attributes to automatically tune the classification process. The result is a system that is just as effective at identifying tomorrow’s spam as it is at identifying spam today. Proofpoint is the only vendor that has successfully combined machine learning techniques with traditional approaches to achieve near-perfect spam detection. Ongoing efforts by Proofpoint’s Attack Response Center scientists and Technical Advisory Board secure Proofpoint’s position as a technology pioneer and industry-leader in the fight against spam. This whitepaper explains the key concepts, technologies and benefits associated with Proofpoint MLX technology.
The Need for Machine Learning Defending messaging systems against today’s spammers requires an intelligent system that can automatically adapt as the attackers’ techniques evolve. Unlike yesterday’s anti-spam technologies, Proofpoint’s MLX technology enables Proofpoint solutions to counter new spam techniques as they emerge, defending messaging systems against tomorrow’s threats as well as today’s.
A Brief History of Anti-spam Technologies: First Generation Solutions In the early days of the spam epidemic—before the introduction of enterprise anti-spam solutions— spammers used simple, straightforward techniques to deliver spam. Spam messages were typically simple text or HTML messages that were mass mailed over sustained periods of time. Given the “static” nature of this spam, first-generation technologies such as signatures and RBLs were able to detect and stop attacks on a reactive basis. Companies like Symantec and others originally used signature-based techniques, very similar to the way anti-virus products work. But spammers quickly developed techniques for randomizing multiple parts of their messages—maintaining the core message, while changing its signature—to thwart detection by signature-based systems. Similarly, RBL techniques rely on understanding the quality and volume of messages associated with a given sender’s IP address by gathering information over substantial periods of time. Again, in the early days of sustained spam campaigns, RBLs were reasonably effective after the initial attack was recognized. The problem today is that spammers rotate IP addresses frequently and often use hijacked machines (so-called “zombie” or “botnet” machines) to send small bursts of spam from an ever-changing array of locations. Overall, fi rst-generation approaches have a low rate of effectiveness against spam because they are easily defeated by randomization and obfuscation strategies. On the positive side, first-generation solutions do not introduce very many false positives (valid messages incorrectly marked as spam).
Second Generation Anti-spam Solutions: Heuristic and Bayesian Approaches To address the increasing frequency and sophistication of spam attacks, second-generation anti-spam vendors (such as CipherTrust, Sophos and Postini) employed heuristics and Bayesian techniques—often in combination with certain 1st generation technologies—in an attempt to create systems that deliver more proactive, resilient defenses against spam. Heuristics are “rules of thumb” that attempt to make a judgment on whether an email is spam or not, based on a small number of “spammy” attributes. The problem is that rules of thumb are not always accurate and can be easily fooled by spammers—especially since most products taking this approach were based on open-source technologies that are readily available to spammers. The introduction of Bayesian techniques began the trend toward using more sophisticated analytic techniques rather than general rules of thumb to identify spam. Bayesian solutions use statistical analysis to look for individual attributes that might indicate whether an email is spam or valid. But this relatively basic statistical approach falls short due to its inability to understand the relationship between attributes. Bayesian systems can often be fooled simply by adding unrelated valid-looking text to a spam message.
|