Unicode Obfuscation & Detecting Spam Over IRC

The Problem:

A Botnet is spamming your server.  Spamfilter isn’t working.  Time to get crafty.

"ΑTTⲚ: This ⅽhɑnᥒеl һɑs moᴠеd to irc.frеeᥒоdᥱ.net #⁄jоiᥒ"
"AΤΤⲚ: Thⅰѕ chɑnnel haѕ moᴠed tഠ irc.frᥱeᥒoⅾе.nᥱt #/јoіn"
"ATTΝ፡ Тhiѕ cһaᥒnel һas mоᴠed tഠ irc.frееnode.net ﹟∕jⲟin"
"AТΤN: Thiѕ chaᥒᥒel has mоveⅾ to іrс.frеenoⅾе.ᥒᥱt #/јoⅰᥒ"
"AТTN﹕ Thiѕ cһannеl һaѕ mo⋁eԁ to irc․freenoⅾᥱ․nеt #/јoіn"
"ΑΤTN︓ Tһis chɑnnᥱl hаѕ moved tο irc.freenode.ᥒet #/ϳoiᥒ"
"ΑTТN: Ꭲhis ϲһɑnnel һaѕ moᴠed to ⅰrc.frеeᥒoԁе.ᥒet #⧸joіn"
"ATTN: This ⅽһannel һas mοvеd to irc.frᥱеnοԁе.net #/join"

Notice any patterns.

The Premise:

Reconstruct the original, un-obfuscated, string.  Now anytime a new user joins, if the first message they send isnt complete ascii-encoded, we do the following:

Any chars that arent in ascii get replaced with a blank space.

Any text remaining must be ascii, so we take each chunk of n-char long text and we see if it maps cleanly into the original statement.



If we have a chunk :

cһaᥒnel һas mоᴠed

then after stripping out non-ascii:

cha nel has moved

So with the 4 chunks of text that remain, we attempt to map each chunk into the original message.  If they all fit, then we have a match and can take the necessary action.

Also its worth noting that any 1-char chunks should be dropped because they wont be useful.  Even 2-char chunks can be dropped if you want to be extra safe.

Further Thoughts

If the spammers start making the spam more dynamic, we can threshold this feature and/or combine it with other detectors to strengthen its detection rate as well as lessen occurance of false-positives.



Obfuscate/Deobfuscate Unicode

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>