A Botnet is spamming your server. Spamfilter isn’t working. Time to get crafty.
"ΑTTⲚ： This ⅽhɑnᥒеl һɑs moᴠеd to irc．frеeᥒоdᥱ.net #⁄jоiᥒ" "AΤΤⲚ: Thⅰѕ chɑnnel haѕ moᴠed tഠ irc.frᥱeᥒoⅾе.nᥱt #／јoіn" "ATTΝ፡ Тhiѕ cһaᥒnel һas mоᴠed tഠ irc.frееnode.net ﹟∕jⲟin" "AТΤN: Thiѕ chaᥒᥒel has mоveⅾ to іrс.frеenoⅾе.ᥒᥱt ＃/јoⅰᥒ" "AТTN﹕ Thiѕ cһannеl һaѕ mo⋁eԁ to irc․freenoⅾᥱ․nеt #/јoіn" "ΑΤTN︓ Tһis chɑnnᥱl hаѕ moved tο irc.freenode．ᥒet #/ϳoiᥒ" "ΑTТN: Ꭲhis ϲһɑnnel һaѕ moᴠed to ⅰrc．frеeᥒoԁе.ᥒet ＃⧸joіn" "ATTN: This ⅽһannel һas mοvеd to irc.frᥱеnοԁе.net #/join"
Notice any patterns.
Reconstruct the original, un-obfuscated, string. Now anytime a new user joins, if the first message they send isnt complete ascii-encoded, we do the following:
Any chars that arent in ascii get replaced with a blank space.
Any text remaining must be ascii, so we take each chunk of n-char long text and we see if it maps cleanly into the original statement.
If we have a chunk :
cһaᥒnel һas mоᴠed
then after stripping out non-ascii:
cha nel has moved
So with the 4 chunks of text that remain, we attempt to map each chunk into the original message. If they all fit, then we have a match and can take the necessary action.
Also its worth noting that any 1-char chunks should be dropped because they wont be useful. Even 2-char chunks can be dropped if you want to be extra safe.
If the spammers start making the spam more dynamic, we can threshold this feature and/or combine it with other detectors to strengthen its detection rate as well as lessen occurance of false-positives.