|
View:
New views
6 Messages
—
Rating Filter:
Alert me
|
|
|
False Positive on SUBJECT_FUZZY_TION ruleHi List,
I'm getting some FP hits against the SUBJECT_FUZZY_TION rule in 25_replace.cf (SA 3.2.5, latest update): header SUBJECT_FUZZY_TION Subject =~ /<post P3>(?!tion)<T><I><O><N>/i describe SUBJECT_FUZZY_TION Attempt to obfuscate words in Subject: replace_rules SUBJECT_FUZZY_TION is hitting on ham from a mailing list with the following subject line: Subject: Re: [CentOS] mount UFS partition on CentOS 5. My regex isn't good enough to understand exactly what this rule is trying to achieve, but it looks to me like some kind of obfuscation of "tion" within a word, but it appears to be hitting on "partition" in this case to my untrained eye. A test email containing just the text "partition" in the subject line also hits this rule so would appear to confirm my assumptions. Could anyone help me understand what this rule is designed to hit, and why it's hitting in this case? Thanks. |
|
|
Re: False Positive on SUBJECT_FUZZY_TION ruleNed Slider wrote:
> Hi List, > > I'm getting some FP hits against the SUBJECT_FUZZY_TION rule in > 25_replace.cf (SA 3.2.5, latest update): > > > header SUBJECT_FUZZY_TION Subject =~ /<post P3>(?!tion)<T><I><O><N>/i > describe SUBJECT_FUZZY_TION Attempt to obfuscate words in Subject: > replace_rules SUBJECT_FUZZY_TION > > > is hitting on ham from a mailing list with the following subject line: > > Subject: Re: [CentOS] mount UFS partition on CentOS 5. > > My regex isn't good enough to understand exactly what this rule is > trying to achieve, but it looks to me like some kind of obfuscation of > "tion" within a word, but it appears to be hitting on "partition" in > this case to my untrained eye. A test email containing just the text > "partition" in the subject line also hits this rule so would appear to > confirm my assumptions. > > Could anyone help me understand what this rule is designed to hit, and > why it's hitting in this case? > > Thanks. > Replying to my own thread... I'm assuming this rule is interpreting "tition" as an obfuscation of "tion" hence why it hits against "partition" as if it were an obfuscation of "partion". Looking at some very crude stats for this rule against a recent corpus of ~1700 ham and ~1800 spam on my server, I see 13 FP hits against ham and only 1 hit against spam (an obfuscation of erection). Admittedly my ham corpus was a technical mailing list likely to contain the term "partition" given it's common usage within IT and triggering of the rule in no way got close to tagging any ham as spam. Anyway, to me this rule doesn't appear to represent good value so I'll probably just adjust the score to 0.001 and monitor it unless someone can suggest a method to prevent it hitting against legitimate words such as partition. |
|
|
RE: False Positive on SUBJECT_FUZZY_TION rule> -----Original Message-----
Subject:
> From: Ned Slider [mailto:ned@...] > Sent: 1 October 2008 12:15 p.m. > To: users@... > Subject: Re: False Positive on SUBJECT_FUZZY_TION rule > > Ned Slider wrote: > > Hi List, > > > > I'm getting some FP hits against the SUBJECT_FUZZY_TION rule in > > 25_replace.cf (SA 3.2.5, latest update): > > > > > > header SUBJECT_FUZZY_TION Subject =~ /<post > P3>(?!tion)<T><I><O><N>/i > > describe SUBJECT_FUZZY_TION Attempt to obfuscate words in > > replace_rules SUBJECT_FUZZY_TION > > > > > > is hitting on ham from a mailing list with the following subject line: > > > > Subject: Re: [CentOS] mount UFS partition on CentOS 5. > > > > My regex isn't good enough to understand exactly what this rule is > > trying to achieve, but it looks to me like some kind of obfuscation of > > "tion" within a word, but it appears to be hitting on "partition" in > > this case to my untrained eye. A test email containing just the text > > "partition" in the subject line also hits this rule so would appear to > > confirm my assumptions. > > > > Could anyone help me understand what this rule is designed to hit, and > > why it's hitting in this case? > > > > Thanks. > > > > > Replying to my own thread... > > I'm assuming this rule is interpreting "tition" as an obfuscation of > "tion" hence why it hits against "partition" as if it were an > obfuscation of "partion". > > Looking at some very crude stats for this rule against a recent corpus > of ~1700 ham and ~1800 spam on my server, I see 13 FP hits against ham > and only 1 hit against spam (an obfuscation of erection). Admittedly > ham corpus was a technical mailing list likely to contain the term > "partition" given it's common usage within IT and triggering of the rule > in no way got close to tagging any ham as spam. > > Anyway, to me this rule doesn't appear to represent good value so I'll > probably just adjust the score to 0.001 and monitor it unless someone > can suggest a method to prevent it hitting against legitimate words such > as partition. Hello Ned. Lowering the score to something that will not be relevant at total score time is a good idea for testing any rules. As you've done a corpus test, and proven that it hits more Ham than Spam (by a significant figure) this proves the rule doesn't really work for your site. If it were my site, I'd disable the rule based on the corpus test. Cheers, Mike |
|
|
Re: False Positive on SUBJECT_FUZZY_TION ruleOn Tue, 30 Sep 2008, Ned Slider wrote:
> header SUBJECT_FUZZY_TION Subject =~ /<post P3>(?!tion)<T><I><O><N>/i > > is hitting on ham from a mailing list with the following subject line: > > Subject: Re: [CentOS] mount UFS partition on CentOS 5. > > A test email containing just the text "partition" in the subject line > also hits this rule so would appear to confirm my assumptions. I suggest you open a bug for this. It certainly should not hit, as the (?!tion) should keep it from matching an unobfuscated word... Does it hit unobfuscated "partitions"? How about "portion"? Perhaps it should be /<post P3>(?!tion)<T><I><O><N><S>?\b/i -- John Hardin KA7OHZ http://www.impsec.org/~jhardin/ jhardin@... FALaholic #11174 pgpk -a jhardin@... key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 ----------------------------------------------------------------------- Government cannot grant rights. Government can only limit, infringe or suppress rights. ----------------------------------------------------------------------- 35 days until the Presidential Election |
|
|
Re: False Positive on SUBJECT_FUZZY_TION ruleJohn Hardin wrote:
> On Tue, 30 Sep 2008, Ned Slider wrote: > >> header SUBJECT_FUZZY_TION Subject =~ /<post >> P3>(?!tion)<T><I><O><N>/i >> >> is hitting on ham from a mailing list with the following subject line: >> >> Subject: Re: [CentOS] mount UFS partition on CentOS 5. >> >> A test email containing just the text "partition" in the subject line >> also hits this rule so would appear to confirm my assumptions. > > I suggest you open a bug for this. It certainly should not hit, as the > (?!tion) should keep it from matching an unobfuscated word... > But I think it's treating the middle "ti" in "tition" as an obfuscation of "tion" hence the match in this case. > Does it hit unobfuscated "partitions"? How about "portion"? > > Perhaps it should be /<post P3>(?!tion)<T><I><O><N><S>?\b/i > It hits partition, partitions, petition, repetition etc (basically any word ending "tition(s)"), but does not hit portion, potion, nation, notion or reputation, for example. I'm happy to open a bug for this if that's appropriate, but the rule does appear to be doing what it's designed to do albeit with some false positives. |
|
|
Re: False Positive on SUBJECT_FUZZY_TION ruleJohn Hardin <jhardin@...> wrote: >> header SUBJECT_FUZZY_TION Subject =~ /<post P3>(?!tion)<T><I><O><N>/i >> >> is hitting on ham from a mailing list with the following subject line: >> >> Subject: Re: [CentOS] mount UFS partition on CentOS 5. >> >> A test email containing just the text "partition" in the subject line >> also hits this rule so would appear to confirm my assumptions. > > I suggest you open a bug for this. It certainly should not hit, as the > (?!tion) should keep it from matching an unobfuscated word... > > Does it hit unobfuscated "partitions"? How about "portion"? > > Perhaps it should be /<post P3>(?!tion)<T><I><O><N><S>?\b/i Here, it matched three spam run subjects yesterday. They were all spam but only one was obfuscated. Subjects were: Listings for specialties such as: general practitioners No competition here, we're best on market. The se_x!_est te.mp.ting te\e'n gI_rLs In sweet ha.rd,cO.re act!on. Joseph Brennan Columbia University Information Technology |
| Free Forum Powered by Nabble | Forum Help |