[Bug 5965] New: custom rule with pattern test of 0 (zero) not processed

View: New views
14 Messages — Rating Filter:   Alert me  

[Bug 5965] New: custom rule with pattern test of 0 (zero) not processed

by Bugzilla from bugzilla-daemon@bugzilla.spamassassin.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5965

           Summary: custom rule with pattern test of 0 (zero) not processed
           Product: Spamassassin
           Version: 3.2.5
          Platform: Other
               URL: http://mail-archives.apache.org/mod_mbox/spamassassin-
                    users/200808.mbox/%3C171dfbe0808251318v40af3f12s51e863d3
                    3fa35b41@...%3E
        OS/Version: All
            Status: NEW
          Severity: trivial
          Priority: P5
         Component: Rules (Eval Tests)
        AssignedTo: dev@...
        ReportedBy: dmsforsa@...


The following rule does not get processed...
See mailing list thread for further details.

header          LOCAL_TESTHEADER        TESTHEADER =~ /^0$/
score           LOCAL_TESTHEADER        1000



message:
TESTHEADER: 0
From:blah@... <From%3Ablah@...>
To: blah1@...
Subject: This is a TESTHEADER: 0 test

blahblah blah


--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5965] custom rule with pattern test of 0 (zero) not processed

by Bugzilla from bugzilla-daemon@bugzilla.spamassassin.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5965





--- Comment #1 from Mark Martinec <Mark.Martinec@...>  2008-08-26 10:29:01 PST ---
Also the following message hits a MISSING_SUBJECT rule:

  From:blah@... <blah@...>
  To: blah1@...
  Subject: 0

  blahblah blah

The underlying reason is a nonpedantic practice in some parts of code
where user-supplied strings (e.g. from messages or rules) are evaluated
in places as Perl booleans, instead of being tested for being defined
or for being an empty string. In both supplied cases the "Subject: 0"
or "TESTHEADER: 0" is indistinguishable from an absent header field.

Please try the attached patch - it should apply to 3.2.* as well as
to 3.3 (with some fudge). I tried to fix the more prominent of these
cases, but I'm sure there are still other more obscure ones, waiting
for a next bug report.


--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5965] custom rule with pattern test of 0 (zero) not processed

by Bugzilla from bugzilla-daemon@bugzilla.spamassassin.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5965





--- Comment #2 from Mark Martinec <Mark.Martinec@...>  2008-08-26 10:31:01 PST ---
Created an attachment (id=4359)
 --> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4359)
proposed patch


--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5965] header field with a value of "0" (zero) invisible

by Bugzilla from bugzilla-daemon@bugzilla.spamassassin.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5965


Mark Martinec <Mark.Martinec@...> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|trivial                     |normal
            Summary|custom rule with pattern    |header field with a value of
                   |test of 0 (zero) not        |"0" (zero) invisible
                   |processed                   |
   Target Milestone|Undefined                   |3.3.0




--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5965] header field with a value of "0" (zero) invisible

by Bugzilla from bugzilla-daemon@bugzilla.spamassassin.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5965


Mark Martinec <Mark.Martinec@...> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|Rules (Eval Tests)          |Libraries




--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5965] header field with a value of "0" (zero) invisible

by Bugzilla from bugzilla-daemon@bugzilla.spamassassin.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5965





--- Comment #3 from Mark Martinec <Mark.Martinec@...>  2008-08-27 18:09:36 PST ---
Now in trunk
(directly or indirectly related to this bug):

r689129 | mmartinec | 2008-08-26 18:28:52 +0200 (Tue, 26 Aug 2008)
bug 5965: do not treat user data as perl booleans (a string "0" is a false);
differentiate between missing and empty header fields; tweak header parsing

r689130 | mmartinec | 2008-08-26 18:53:14 +0200 (Tue, 26 Aug 2008)
fixed previous patch (hb separator test failing)

r689682 | mmartinec | 2008-08-28 02:44:56 +0200 (Thu, 28 Aug 2008)
- continue work on avoiding user data to be tested as perl booleans,
  instead test for defined or for an empty string as appropriate;
- pms->get can now distinguish between empty and nonexistent header
  fields, undef is returned for nonexistent header field unless a
  default value argument is explicitly set to some defined value
  like an empty string;
- modified calls to pms->get to deal with undef as appropriate;
- Conf.pm, Conf/Parser.pm and Plugin/Check.pm now work together and turn
  a rule 'exists:name_of_header' into a defined(name_of_header) instead
  of a  name_of_header =~ /./  to match the documentation ("Define a
  header existence test") and make it possible to distinguish empty
  from nonexistent header fields; in principle the new code could allow
  operators like 'eq' and 'ne' or function calls in header tests
  in addition to regexp matching operators '=~' and '!~' (but this
  is currently not allowed by the parser);
(plus some collateral small fixes)

The whole change is too extensive for a 3.2 branch, and as the original
bug report had a label of minor severity, I suggest this ticket be
closed now, changes will only be in 3.3.


--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5965] header field with a value of "0" (zero) invisible

by Bugzilla from bugzilla-daemon@bugzilla.spamassassin.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5965





--- Comment #4 from Justin Mason <jm@...>  2008-08-28 02:28:37 PST ---
broadly good, but -- I'm not fond of the side-effect change to the PMS::get()
API.  This is a public API for third party plugins and eval rules, so this will
create a lot of noise when they call ->get() in the previous manner, then match
against the returned value without checking for undef first.

Is there some other way we can support our own internal code getting undef
returns, without breaking backwards compat on that API?  I'm thinking a new,
wrapper method around ->get(), which passes a new "magic" value for the
optional $defval argument to the ->_get() method, indicating that returning
undef is acceptable, and if that's not present then the old style of returning
'' on undef is used.


--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5965] header field with a value of "0" (zero) invisible

by Bugzilla from bugzilla-daemon@bugzilla.spamassassin.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5965





--- Comment #5 from Mark Martinec <Mark.Martinec@...>  2008-08-28 05:41:13 PST ---
(In reply to comment #4)
> broadly good, but -- I'm not fond of the side-effect change to the PMS::get()
> API.  This is a public API for third party plugins and eval rules, so this will
> create a lot of noise when they call ->get() in the previous manner, then match
> against the returned value without checking for undef first.
>
> Is there some other way we can support our own internal code getting undef
> returns, without breaking backwards compat on that API? [...]

There is a better way, I had it implemented for a while but scrapped it
because calls to get() could look weird. I'll make the following change
to the tail section of a get() and appropriate changes to calls
to retain compatibility:

currently:
  return (defined $found ? $found : $_[2]);

new:
  # if the requested header wasn't found, we should return a default value
  # as specified by the caller: if defval argument is present it represents
  # a default value even if undef; if defval argument is absent a default
  # value is an empty string for upwards compatibility
  return (defined $found ? $found : @_ > 2 ? $_[2] : '');

so now the:
  $pms->get('Subject')
becomes equivalent to:
  $pms->get('Subject','')

and if one wants to receive undef for nonexistent header, the
undef needs to be specified explicitly as a second argument:
  $pms->get('Subject',undef)


--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5965] header field with a value of "0" (zero) invisible

by Bugzilla from bugzilla-daemon@bugzilla.spamassassin.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5965





--- Comment #6 from Justin Mason <jm@...>  2008-08-28 05:50:21 PST ---
(In reply to comment #5)
> so now the:
>   $pms->get('Subject')
> becomes equivalent to:
>   $pms->get('Subject','')
>
> and if one wants to receive undef for nonexistent header, the
> undef needs to be specified explicitly as a second argument:
>   $pms->get('Subject',undef)

perfect, thanks!  I was trying to figure out a way to do that, but didn't see
it.

+1 from me ;)


--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5965] header field with a value of "0" (zero) invisible

by Bugzilla from bugzilla-daemon@bugzilla.spamassassin.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5965





--- Comment #7 from Michael Parker <parkerm@...>  2008-08-28 06:55:46 PST ---
Just to chime in a bit, the get code is a real hotspot in our profile, so
please be very very very careful what what you are doing there.  Run some
before and after profiles to make sure you didn't slow it down.


--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5965] header field with a value of "0" (zero) invisible

by Bugzilla from bugzilla-daemon@bugzilla.spamassassin.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5965





--- Comment #8 from Justin Mason <jm@...>  2008-08-28 07:45:14 PST ---
agreed, it's a hotspot.  I did a quick check of performance of this morning's
SVN code, based on 10000 accesses to a cached header; the new code seemed
slightly slower, about 8% (for the get() method alone).  imo that's OK, and can
be optimized later.


--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5965] header field with a value of "0" (zero) invisible

by Bugzilla from bugzilla-daemon@bugzilla.spamassassin.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5965





--- Comment #9 from Mark Martinec <Mark.Martinec@...>  2008-08-28 08:16:04 PST ---
Done (compatibility):

r689835 | mmartinec | 2008-08-28 16:28:02 +0200 (Thu, 28 Aug 2008)
Changed PMS::get and its calls for compatibility regarding a
default value: if the requested header field wasn't found,
return a default value as specified by the caller: if defval
argument is present it represents a default value even if undef;
if defval argument is absent a default value is an empty string
for compatibility


--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5965] header field with a value of "0" (zero) invisible

by Bugzilla from bugzilla-daemon@bugzilla.spamassassin.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5965





--- Comment #10 from Mark Martinec <Mark.Martinec@...>  2008-08-28 08:58:07 PST ---
> agreed, it's a hotspot.  I did a quick check of performance of this morning's
> SVN code, based on 10000 accesses to a cached header; the new code seemed
> slightly slower, about 8% (for the get() method alone).  imo that's OK, and can
> be optimized later.

One must take into account the actual hit/miss cache ratio.
Instrumenting the pms->get with two counters and letting it run for
a while on our production mail server gives 45% hits/attempts ratio.

While the hit code path is a bit slower, the miss path is much faster
(original code evaluates $_[0]->{c}->{$_[1]} multiple times),
which in my benchmark yields an overall improvement. With a
simulated and quick _get the improvement is 30%, yet in practice
the speedup is negligible because the actual _get is very slow
compared to the get().


--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5965] header field with a value of "0" (zero) invisible

by Bugzilla from bugzilla-daemon@bugzilla.spamassassin.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5965





--- Comment #11 from Mark Martinec <Mark.Martinec@...>  2008-08-28 10:14:07 PST ---
...not to mention that a meager 300 calls of get() per message check
are negligible anyway...


--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
LightInTheBox - Buy quality products at wholesale price!