Today the major problem that the people are facing is spam mails or e-mail spam. In recent years there are so many schemes are developed to detect the spam emails. Here the primary idea of the similarity matching scheme for spam detection is to maintain a known spam database, formed by user’s feedback, to block the subsequent near-duplicate spam’s. We propose a novel e-mail abstraction scheme, which considers e-mail layout structure to represent e-mails. We present a procedure to generate the e-mail abstraction using HTML content in e-mail, and this newly devised abstraction can more effectively capture the near-duplicate phenomenon of spams. Moreover, we design a complete spam detection system Cosdes (standing for Collaborative Spam Detection System), which possesses an efficient near-duplicate matching scheme and a progressive update scheme. To detect fastly near duplicates and duplicate spam mails in Cosdes, we propose a new approach SimHash.
REDDY, M. SIVA KUMAR and SAGAR, B. KRISHNA
"IMPROVED NEAR DUPLICATE MATCHING SCHEME FOR E-MAIL SPAM DETECTION,"
International Journal of Computer and Communication Technology: Vol. 7
, Article 9.
Available at: https://www.interscience.in/ijcct/vol7/iss4/9