Strange continous SEVERE error after upgrading to APR 1.3, Critical poller failure, Timer expired

View: New views
13 Messages — Rating Filter:   Alert me  

Strange continous SEVERE error after upgrading to APR 1.3, Critical poller failure, Timer expired

by JimiH :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

After we upgraded APR from 1.2.8 to 1.3.0 we get a strange error in tomcat's catalina.out that we have never seen before:

Jun 24, 2008 11:42:09 AM org.apache.tomcat.util.net.AprEndpoint$Poller run
SEVERE: Critical poller failure (restarting poller): [62] Timer expired

This is repeated over and over, at approximately 5-10 times per second. If we switch back to APR 1.2.8, and not touching anything else, then this error message goes away.

We use Tomcat 5.5.23 on Solaris 10 (Sun X2200 and X2100). We have tried with APR 1.3.2 and the result was the same as for 1.3.0. As far as I can tell, a regular poll time out should not result in this error, but still I tried increasing the pollTime from the default 2000 (2ms) to 20000 (20ms) for the connectors in server.xml (http://tomcat.apache.org/tomcat-5.5-doc/apr.html) but that didn't change anything.

I have investigated the tomcat process using the truss command (like strace I guess, but for solaris) right from the tomcat startup, and this is the first 3000 lines of the output:

http://pastebin.com/f60671b83

If you search for "ETIME" (the name of the error message "Timer expired") you can see that the first occurence (at line 2545)is at about 1 second after startup (so it seems that it is not triggered by a request). The thread in questions is thread number 6.

Here is the truss log file for the first 20 minutes or so, filtered to only include the thread 6:

http://pastebin.com/f3fd14e2c

There were also other threads that experienced this "Err#62 ETIME", but thread number 6 seems to be the one with the most number of these errors. Also, when restarting tomcat it is always the thread number 6 that has the this error first and has the most number of occurences of it.

Does the message "Critical poller failure (restarting poller): [62] Timer expired" or the truss log output tell you guys anything about what can be wrong? Could it be a bug? Is there something wrong with our web application? The thing is that we haven't noticed anything wrong with it so far, besides these annoying error messages in catalina.out.

Regards
/Jimi


mogul | jimi hullegård | system developer | hudiksvallsgatan 4, 113 30 stockholm sweden | +46 8 506 66 172 | +46 765 27 19 55 | jimi.hullegard@... | www.mogul.com

Re: Strange continous SEVERE error after upgrading to APR 1.3, Critical poller failure, Timer expired

by Nick Kew :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, 25 Jun 2008 10:32:58 +0200
Jimi Hullegård <jimi.hullegard@...> wrote:

> Hi,
>
> After we upgraded APR from 1.2.8 to 1.3.0 we get a strange error in
> tomcat's catalina.out that we have never seen before:
>
> Jun 24, 2008 11:42:09 AM
> org.apache.tomcat.util.net.AprEndpoint$Poller run SEVERE: Critical
> poller failure (restarting poller): [62] Timer expired

Could this be related to
http://marc.info/?l=apr-dev&m=121347872322492&w=2

I had something similar with httpd+Event MPM on solaris.
Current workaround is to use Worker MPM.

--
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/

RE: Strange continous SEVERE error after upgrading to APR 1.3, Critical poller failure, Timer expired

by JimiH :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> -----Original Message-----
> From: Nick Kew [mailto:nick@...]
> Sent: den 25 juni 2008 13:44
> To: dev@...
> Subject: Re: Strange continous SEVERE error after upgrading
> to APR 1.3, Critical poller failure, Timer expired
>
> On Wed, 25 Jun 2008 10:32:58 +0200
> Jimi Hullegård <jimi.hullegard@...> wrote:
>
> > Hi,
> >
> > After we upgraded APR from 1.2.8 to 1.3.0 we get a strange error in
> > tomcat's catalina.out that we have never seen before:
> >
> > Jun 24, 2008 11:42:09 AM
> > org.apache.tomcat.util.net.AprEndpoint$Poller run SEVERE: Critical
> > poller failure (restarting poller): [62] Timer expired
>
> Could this be related to
> http://marc.info/?l=apr-dev&m=121347872322492&w=2
>
> I had something similar with httpd+Event MPM on solaris.
> Current workaround is to use Worker MPM.

I'm not quite sure I follow you now... Worker MPM is for Apache Httpd, right? We don't use httpd. Do you know of any workaround for Tomcat?

/Jimi

Re: Strange continous SEVERE error after upgrading to APR 1.3, Critical poller failure, Timer expired

by Nick Kew :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Wed, 2008-06-25 at 12:43, Nick Kew wrote:

> On Wed, 25 Jun 2008 10:32:58 +0200
> Jimi Hullegård <jimi.hullegard@...> wrote:
>
> > Hi,
> >
> > After we upgraded APR from 1.2.8 to 1.3.0 we get a strange error in
> > tomcat's catalina.out that we have never seen before:
> >
> > Jun 24, 2008 11:42:09 AM
> > org.apache.tomcat.util.net.AprEndpoint$Poller run SEVERE: Critical
> > poller failure (restarting poller): [62] Timer expired
>
> Could this be related to
> http://marc.info/?l=apr-dev&m=121347872322492&w=2
>
> I had something similar with httpd+Event MPM on solaris.
> Current workaround is to use Worker MPM.

OK, I've just tested APR variants with Event MPM.

It is indeed that fix that breaks it.  When I revert
the fix, the problem goes away.

The original patch posted to fix the same bug works fine.
Can I suggest you apply the following patch to 1.3.x
(should work on 1.3.0 or 1.3.2 too) and let us know if
it fixes the problem you're seeing?

Index: poll/unix/port.c
===================================================================
--- poll/unix/port.c    (revision 662044)
+++ poll/unix/port.c    (working copy)
@@ -315,7 +315,15 @@

     if (ret == -1) {
         (*num) = 0;
-        rv = apr_get_netos_error();
+        if (errno == EINTR) {
+            rv = APR_EINTR;
+        }
+        else if (errno == ETIME) {
+            rv = APR_TIMEUP;
+        }
+        else {
+            rv = APR_EGENERAL;
+        }
     }
     else if (nget == 0) {
         rv = APR_TIMEUP;


--
Nick Kew


RE: Strange continous SEVERE error after upgrading to APR 1.3, Critical poller failure, Timer expired

by JimiH :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Nick Kew wrote:

> On Wed, 2008-06-25 at 12:43, Nick Kew wrote:
> >
> > Could this be related to
> > http://marc.info/?l=apr-dev&m=121347872322492&w=2
> >
> > I had something similar with httpd+Event MPM on solaris.
> > Current workaround is to use Worker MPM.
>
> OK, I've just tested APR variants with Event MPM.
>
> It is indeed that fix that breaks it.  When I revert
> the fix, the problem goes away.
>
> The original patch posted to fix the same bug works fine.
> Can I suggest you apply the following patch to 1.3.x
> (should work on 1.3.0 or 1.3.2 too) and let us know if
> it fixes the problem you're seeing?
>
> [patch code snipped]

Ok, I have forwarded this to some persons that have access to the system now (I am currently at a different office with no remote access). I hope they have time to test this soon.

/Jimi

Re: Strange continous SEVERE error after upgrading to APR 1.3, Critical poller failure, Timer expired

by Ruediger Pluem :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On 06/25/2008 02:55 PM, Nick Kew wrote:

> On Wed, 2008-06-25 at 12:43, Nick Kew wrote:
>> On Wed, 25 Jun 2008 10:32:58 +0200
>> Jimi Hullegård <jimi.hullegard@...> wrote:
>>
>>> Hi,
>>>
>>> After we upgraded APR from 1.2.8 to 1.3.0 we get a strange error in
>>> tomcat's catalina.out that we have never seen before:
>>>
>>> Jun 24, 2008 11:42:09 AM
>>> org.apache.tomcat.util.net.AprEndpoint$Poller run SEVERE: Critical
>>> poller failure (restarting poller): [62] Timer expired
>> Could this be related to
>> http://marc.info/?l=apr-dev&m=121347872322492&w=2
>>
>> I had something similar with httpd+Event MPM on solaris.
>> Current workaround is to use Worker MPM.
>
> OK, I've just tested APR variants with Event MPM.
>
> It is indeed that fix that breaks it.  When I revert
> the fix, the problem goes away.
>
> The original patch posted to fix the same bug works fine.
> Can I suggest you apply the following patch to 1.3.x
> (should work on 1.3.0 or 1.3.2 too) and let us know if
> it fixes the problem you're seeing?
>
> Index: poll/unix/port.c
> ===================================================================
> --- poll/unix/port.c    (revision 662044)
> +++ poll/unix/port.c    (working copy)
> @@ -315,7 +315,15 @@
>
>      if (ret == -1) {
>          (*num) = 0;
> -        rv = apr_get_netos_error();
> +        if (errno == EINTR) {
> +            rv = APR_EINTR;
> +        }
> +        else if (errno == ETIME) {
> +            rv = APR_TIMEUP;
> +        }
> +        else {
> +            rv = APR_EGENERAL;
> +        }
>      }
>      else if (nget == 0) {
>          rv = APR_TIMEUP;
>
>

Looks good and makes sense.
apr_get_netos_error() in case of ETIME = 720062
APR_TIMEUP = 70007

So they are different.

But I would do

rv = apr_get_netos_error();

instead

of

rv = APR_EGENERAL;

Regards

Rüdiger




Re: Strange continous SEVERE error after upgrading to APR 1.3, Critical poller failure, Timer expired

by William A. Rowe, Jr. :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ruediger Pluem wrote:

>
> On 06/25/2008 02:55 PM, Nick Kew wrote:
>>
>> OK, I've just tested APR variants with Event MPM.
>>
>> It is indeed that fix that breaks it.  When I revert
>> the fix, the problem goes away.
>>
>> The original patch posted to fix the same bug works fine.
>> Can I suggest you apply the following patch to 1.3.x
>> (should work on 1.3.0 or 1.3.2 too) and let us know if
>> it fixes the problem you're seeing?
>>
>> Index: poll/unix/port.c
>> ===================================================================
>> --- poll/unix/port.c    (revision 662044)
>> +++ poll/unix/port.c    (working copy)
>> @@ -315,7 +315,15 @@
>>
>>      if (ret == -1) {
>>          (*num) = 0;
>> -        rv = apr_get_netos_error();
>> +        if (errno == EINTR) {
>> +            rv = APR_EINTR;
>> +        }
>> +        else if (errno == ETIME) {
>> +            rv = APR_TIMEUP;
>> +        }
>> +        else {
>> +            rv = APR_EGENERAL;
>> +        }
>>      }
>>      else if (nget == 0) {
>>          rv = APR_TIMEUP;
>>
>>
>
> Looks good and makes sense.
> apr_get_netos_error() in case of ETIME = 720062
> APR_TIMEUP = 70007
>
> So they are different.
>
> But I would do
>
> rv = apr_get_netos_error();
>
> instead
>
> of
>
> rv = APR_EGENERAL;


Good observations.  Folks, you want to fix these regressions before
we T&R 1.3.3?

Bill

RE: Strange continous SEVERE error after upgrading to APR 1.3, Critical poller failure, Timer expired

by JimiH :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Nick Kew wrote:

>
> OK, I've just tested APR variants with Event MPM.
>
> It is indeed that fix that breaks it.  When I revert
> the fix, the problem goes away.
>
> The original patch posted to fix the same bug works fine.
> Can I suggest you apply the following patch to 1.3.x
> (should work on 1.3.0 or 1.3.2 too) and let us know if
> it fixes the problem you're seeing?
>
> Index: poll/unix/port.c
> ===================================================================
> --- poll/unix/port.c    (revision 662044)
> +++ poll/unix/port.c    (working copy)
> @@ -315,7 +315,15 @@
>
>      if (ret == -1) {
>          (*num) = 0;
> -        rv = apr_get_netos_error();
> +        if (errno == EINTR) {
> +            rv = APR_EINTR;
> +        }
> +        else if (errno == ETIME) {
> +            rv = APR_TIMEUP;
> +        }
> +        else {
> +            rv = APR_EGENERAL;
> +        }
>      }
>      else if (nget == 0) {
>          rv = APR_TIMEUP;
>

We have now applied this patch to 1.3.2 and it successfully solved the problem, ie no more errors about "Critical
poller failure" in the log file. Thanks!

By the way, does it look like this patch (or some variant, as mentioned in other postings in this thread) will be included in a future release (like 1.3.3)?

Regards
/Jimi

Re: Strange continous SEVERE error after upgrading to APR 1.3, Critical poller failure, Timer expired

by Nick Kew :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, 26 Jun 2008 10:10:48 +0200
Jimi Hullegård <jimi.hullegard@...> wrote:

> By the way, does it look like this patch (or some variant, as
> mentioned in other postings in this thread) will be included in a
> future release (like 1.3.3)?

Yes.  Your confirmation of the fix was what we were waiting for.
Now we just have to decide which of the two variants to go with!

--
Nick Kew

Re: Strange continous SEVERE error after upgrading to APR 1.3, Critical poller failure, Timer expired

by Nick Kew :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, 25 Jun 2008 21:43:47 +0200
Ruediger Pluem <rpluem@...> wrote:


> Looks good and makes sense.
> apr_get_netos_error() in case of ETIME = 720062
> APR_TIMEUP = 70007
>
> So they are different.

which is, in itself, a little disturbing.

> But I would do
>
> rv = apr_get_netos_error();
>
> instead
>
> of
>
> rv = APR_EGENERAL;

Works for me in terms of fixing this bug.  But looking at the
history of this bug (we had a much smaller bug, had a fix,
then introduced this bug by switching to apr_get_netos_error
because it felt better), I'd be wary of repeating the same
thought process without more thorough analysis and testing.

How about APR_EGENERAL for 1.3/release, and apr_get_netos_error
for trunk, where it can hopefully get some exposure?

--
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/

Re: Strange continous SEVERE error after upgrading to APR 1.3, Critical poller failure, Timer expired

by Ruediger Pluem :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On 06/26/2008 01:15 PM, Nick Kew wrote:

>
> Works for me in terms of fixing this bug.  But looking at the
> history of this bug (we had a much smaller bug, had a fix,
> then introduced this bug by switching to apr_get_netos_error
> because it felt better), I'd be wary of repeating the same
> thought process without more thorough analysis and testing.
>
> How about APR_EGENERAL for 1.3/release, and apr_get_netos_error
> for trunk, where it can hopefully get some exposure?

+1. Do you commit or should I?

Regards

Rüdiger


>

Re: Strange continous SEVERE error after upgrading to APR 1.3, Critical poller failure, Timer expired

by Jim Jagielski :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Jun 26, 2008, at 7:15 AM, Nick Kew wrote:

> On Wed, 25 Jun 2008 21:43:47 +0200
> Ruediger Pluem <rpluem@...> wrote:
>
>
>> Looks good and makes sense.
>> apr_get_netos_error() in case of ETIME = 720062
>> APR_TIMEUP = 70007
>>
>> So they are different.
>
> which is, in itself, a little disturbing.
>
>> But I would do
>>
>> rv = apr_get_netos_error();
>>
>> instead
>>
>> of
>>
>> rv = APR_EGENERAL;
>
> Works for me in terms of fixing this bug.  But looking at the
> history of this bug (we had a much smaller bug, had a fix,
> then introduced this bug by switching to apr_get_netos_error
> because it felt better), I'd be wary of repeating the same
> thought process without more thorough analysis and testing.
>
> How about APR_EGENERAL for 1.3/release, and apr_get_netos_error
> for trunk, where it can hopefully get some exposure?
>

+1


Re: Strange continous SEVERE error after upgrading to APR 1.3, Critical poller failure, Timer expired - Bayesian Filter detected spam

by JimiH :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I just noticed that version 1.3.3 is out, but I can't see this fix in the release notes. Is the fix for this bug included in 1.3.3?

Regards
/Jimi

mogul | jimi hullegård | system developer | hudiksvallsgatan 4, 113 30 stockholm sweden | +46 8 506 66 172 | +46 765 27 19 55 | jimi.hullegard@... | www.mogul.com


> -----Original Message-----
> From: Nick Kew [mailto:nick@...]
> Sent: den 26 juni 2008 12:50
> To: dev@...
> Subject: Re: Strange continous SEVERE error after
> upgrading to APR 1.3, Critical poller failure, Timer expired
> - Bayesian Filter detected spam
>
> On Thu, 26 Jun 2008 10:10:48 +0200
> Jimi Hullegård <jimi.hullegard@...> wrote:
>
> > By the way, does it look like this patch (or some variant, as
> > mentioned in other postings in this thread) will be included in a
> > future release (like 1.3.3)?
>
> Yes.  Your confirmation of the fix was what we were waiting for.
> Now we just have to decide which of the two variants to go with!
>
> --
> Nick Kew
>
LightInTheBox - Buy quality products at wholesale price