Few Internet security issues have attained the universal public
recognition or contempt of unsolicited bulk email -- SPAM. The engine that
drives this enormous activity is not spam itself -- which is simply a means to
an end -- but the various money-making ``scams'' (legal or illegal) that
extract value from Internet users. In this paper, we focus on the Internet
infrastructure used to host and support such scams. Unlike mail-relays or
bots, scam infrastructure is directly implicated in the spam profit cycle and
thus considerably rarer and more valuable. Our goal is to measure and analyze
this scam infrastructure to better understand the dynamics and business
pressures exerted on spammers. To identify scam infrastructure, we employ an
opportunistic technique called spamscatter. The underlying principal is that
each scam is, by necessity, identified in the link structure of associated
spams. To this end, we have built a system that mines email, identifies URLs
in real time and follows such links to their eventual destination server. We
further identify individual scams by clustering scam servers whose rendered Web
pages are graphically similar using a technique called image shingling. Using
the spamscatter technique on a large real-time spam feed (roughly 150,000 per
day) we identify and analyze over 2,000 distinct scams hosted across more than
7,000 distinct servers.
Pre-2018 CSE ID: CS2007-0887