Misterhouse and high CPU usage

View: New views
15 Messages — Rating Filter:   Alert me  

Misterhouse and high CPU usage

by Tom-172 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I read through the thread posted a while back about zombie processes but
I don't see any evidence of any zombies on my system.  However, after a
few hours after restarting MH (sometimes longer, sometimes shorter), mh
takes up around 90% of the cpu.  it goes back down after a few seconds
then goes back up.

Is there a way to see a breakdown of what is running and causing the
high utilization?

I'm running on CentOS using the latest version from SVN.

--------------------
As I was typing this email, I noticed something.
I opened up 2 sessions over SSH and while watching TOP, I was also
watching the MH output

When MH is eating up the CPU, I see the following behavior:
The spikes happens every minute when the object states are getting saved.
The Saving object states is always followed by a Paused for x seconds

05/28/08 01:10:00 PM: Saving object states ... done
05/28/08 01:10:00 PM Warning, memory leak detected: 109.0 hours:   1.0
MB in  1.3 hours.   78.2 ->  79.1 at  0.72 MB/hour.  Total:  0.37 MB/hour
05/28/08 01:10:00 PM Paused for 27 seconds

05/28/08 01:14:00 PM: Saving object states ... done
05/28/08 01:14:00 PM Paused for 27 seconds

When I restarted MH, the CPU doesn't spike every minutes and object
state saves happen without the "Paused for x seconds"

-----------------------

Here's a top when mh is taking up 94.8% of the CPU.

Tasks:  64 total,   2 running,  62 sleeping,   0 stopped,   0 zombie
Cpu(s): 94.6% us,  5.4% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
Mem:    515544k total,   428484k used,    87060k free,   111816k buffers
Swap:  1048568k total,      236k used,  1048332k free,   148052k cached

  PID USER      PR  NI %CPU    TIME+  %MEM  VIRT  RES  SHR S
COMMAND                                                              
26917 root      25   0 94.8   2064:47 13.9 78256  70m 2896 R
mh                                                                    
 9218 root      15   0  2.6   0:00.33  0.5  8832 2612 2100 S
sshd                                                                  
 9254 root      16   0  2.6   0:00.83  0.2  3836  992  780 R
top                                                                  
    1 root      16   0  0.0   0:17.17  0.1  1688  544  468 S
init                                                                  
    2 root      34  19  0.0   0:00.02  0.0     0    0    0 S
ksoftirqd/0                                                          
    3 root       5 -10  0.0   0:00.02  0.0     0    0    0 S
events/0                                                              
    4 root       6 -10  0.0   0:00.03  0.0     0    0    0 S
khelper                                                              
    5 root       5 -10  0.0   0:00.00  0.0     0    0    0 S
kblockd/0                                                            
    6 root      15   0  0.0   0:00.00  0.0     0    0    0 S
khubd                                                                
   35 root      15   0  0.0   0:29.98  0.0     0    0    0 S
kapmd                                                                
   38 root      20   0  0.0   0:00.00  0.0     0    0    0 S
pdflush                                                              
   39 root      15   0  0.0   0:16.99  0.0     0    0    0 S
pdflush            



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
________________________________________________________
To unsubscribe from this list, go to: http://sourceforge.net/mail/?group_id=1365


Re: Misterhouse and high CPU usage

by Tom-172 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Any ideas?

On Wed, May 28, 2008 at 1:26 PM, Tom <tom.valdes@...> wrote:

> I read through the thread posted a while back about zombie processes but I
> don't see any evidence of any zombies on my system.  However, after a few
> hours after restarting MH (sometimes longer, sometimes shorter), mh takes up
> around 90% of the cpu.  it goes back down after a few seconds then goes back
> up.
>
> Is there a way to see a breakdown of what is running and causing the high
> utilization?
>
> I'm running on CentOS using the latest version from SVN.
>
> --------------------
> As I was typing this email, I noticed something.
> I opened up 2 sessions over SSH and while watching TOP, I was also watching
> the MH output
>
> When MH is eating up the CPU, I see the following behavior:
> The spikes happens every minute when the object states are getting saved.
> The Saving object states is always followed by a Paused for x seconds
>
> 05/28/08 01:10:00 PM: Saving object states ... done
> 05/28/08 01:10:00 PM Warning, memory leak detected: 109.0 hours:   1.0 MB in
>  1.3 hours.   78.2 ->  79.1 at  0.72 MB/hour.  Total:  0.37 MB/hour
> 05/28/08 01:10:00 PM Paused for 27 seconds
>
> 05/28/08 01:14:00 PM: Saving object states ... done
> 05/28/08 01:14:00 PM Paused for 27 seconds
>
> When I restarted MH, the CPU doesn't spike every minutes and object state
> saves happen without the "Paused for x seconds"
>
> -----------------------
>
> Here's a top when mh is taking up 94.8% of the CPU.
>
> Tasks:  64 total,   2 running,  62 sleeping,   0 stopped,   0 zombie
> Cpu(s): 94.6% us,  5.4% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
> Mem:    515544k total,   428484k used,    87060k free,   111816k buffers
> Swap:  1048568k total,      236k used,  1048332k free,   148052k cached
>
>  PID USER      PR  NI %CPU    TIME+  %MEM  VIRT  RES  SHR S COMMAND
>                                                       26917 root      25   0
> 94.8   2064:47 13.9 78256  70m 2896 R mh
>                                9218 root      15   0  2.6   0:00.33  0.5
>  8832 2612 2100 S sshd
>            9254 root      16   0  2.6   0:00.83  0.2  3836  992  780 R top
>                                                                   1 root
>  16   0  0.0   0:17.17  0.1  1688  544  468 S init
>                                           2 root      34  19  0.0   0:00.02
>  0.0     0    0    0 S ksoftirqd/0
>                   3 root       5 -10  0.0   0:00.02  0.0     0    0    0 S
> events/0                                                                 4
> root       6 -10  0.0   0:00.03  0.0     0    0    0 S khelper
>                                                   5 root       5 -10  0.0
> 0:00.00  0.0     0    0    0 S kblockd/0
>                           6 root      15   0  0.0   0:00.00  0.0     0    0
>    0 S khubd
>  35 root      15   0  0.0   0:29.98  0.0     0    0    0 S kapmd
>                                                      38 root      20   0
>  0.0   0:00.00  0.0     0    0    0 S pdflush
>                                  39 root      15   0  0.0   0:16.99  0.0
> 0    0    0 S pdflush
>
>

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
________________________________________________________
To unsubscribe from this list, go to: http://sourceforge.net/mail/?group_id=1365


Re: Misterhouse and high CPU usage

by Gregg Liming :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Tom,

Tom wrote:
> Any ideas?

... stuff to consider ...

1) mh may spike CPU if it needs to "catch up" from being starved by
anything else running at a higher priority.  To rule this out, consider
running the command renice to bump up the nice level:

renice +5 -u mh

Here, the "-u mh" applies a increase to the nice level for all processes
owned by mh.  For me, this is easier than collecting pids from files.
But, if running as root, then you'd have to supply a pid unless you want
all processes running as root to be similarly applied.

I would definitely do this if I had other stuff running on the same
system that should be considered lower priority.  Be careful that you
don't "overnice" mh and cause key OS processes to be starved.

2) I have an all to often bad habit of "tailing" my console output that
is redirected to a log file via the -log argument.  Usually, this is
because I'm trying to catch or watch certain events for code that I'm
working on.  If you leave a session up like this for an extended period
of time, then I have noticed pauses.

3) I can't quite tell from your discussion below whether the "spiking"
is occasional or frequent.  Also, is it short-lived or significant
duration?  Perhaps a better way to get some hint of history is to post
your pause log from a restart up to a condition of frequent/recurring
CPU spikes so that we can tell if the condition is linearly worsening or
seemingly spontaneous.

4) Do you have open browsers to mh's ia5 interface when the spiking is
occurring?  While debugging the problem, make sure that they are all
closed/killed.

5) Consider backing out (disabling) various modules that may be less
essential.  If you're running/using any of the rrd
functionality--particularly graphing--considering disabling those first.

6) You mention a correlation between pausing and saving object states.
Try watching the size of the saved_states files to see if they are
fluctuating between the point right after restart to whenever you start
seeing pausing:

ls -l <mh_data_dir>/mh_temp.saved_states*

Perhaps something from the above may give you some new insight/info that
could lead to a diagnosis.

Gregg

> On Wed, May 28, 2008 at 1:26 PM, Tom <tom.valdes@...> wrote:
>> I read through the thread posted a while back about zombie processes but I
>> don't see any evidence of any zombies on my system.  However, after a few
>> hours after restarting MH (sometimes longer, sometimes shorter), mh takes up
>> around 90% of the cpu.  it goes back down after a few seconds then goes back
>> up.
>>
>> Is there a way to see a breakdown of what is running and causing the high
>> utilization?
>>
>> I'm running on CentOS using the latest version from SVN.
>>
>> --------------------
>> As I was typing this email, I noticed something.
>> I opened up 2 sessions over SSH and while watching TOP, I was also watching
>> the MH output
>>
>> When MH is eating up the CPU, I see the following behavior:
>> The spikes happens every minute when the object states are getting saved.
>> The Saving object states is always followed by a Paused for x seconds
>>
>> 05/28/08 01:10:00 PM: Saving object states ... done
>> 05/28/08 01:10:00 PM Warning, memory leak detected: 109.0 hours:   1.0 MB in
>>  1.3 hours.   78.2 ->  79.1 at  0.72 MB/hour.  Total:  0.37 MB/hour
>> 05/28/08 01:10:00 PM Paused for 27 seconds
>>
>> 05/28/08 01:14:00 PM: Saving object states ... done
>> 05/28/08 01:14:00 PM Paused for 27 seconds
>>
>> When I restarted MH, the CPU doesn't spike every minutes and object state
>> saves happen without the "Paused for x seconds"
>>
>> -----------------------
>>
>> Here's a top when mh is taking up 94.8% of the CPU.
>>
>> Tasks:  64 total,   2 running,  62 sleeping,   0 stopped,   0 zombie
>> Cpu(s): 94.6% us,  5.4% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
>> Mem:    515544k total,   428484k used,    87060k free,   111816k buffers
>> Swap:  1048568k total,      236k used,  1048332k free,   148052k cached
>>
>>  PID USER      PR  NI %CPU    TIME+  %MEM  VIRT  RES  SHR S COMMAND
>>                                                       26917 root      25   0
>> 94.8   2064:47 13.9 78256  70m 2896 R mh
>>                                9218 root      15   0  2.6   0:00.33  0.5
>>  8832 2612 2100 S sshd
>>            9254 root      16   0  2.6   0:00.83  0.2  3836  992  780 R top
>>                                                                   1 root
>>  16   0  0.0   0:17.17  0.1  1688  544  468 S init
>>                                           2 root      34  19  0.0   0:00.02
>>  0.0     0    0    0 S ksoftirqd/0
>>                   3 root       5 -10  0.0   0:00.02  0.0     0    0    0 S
>> events/0                                                                 4
>> root       6 -10  0.0   0:00.03  0.0     0    0    0 S khelper
>>                                                   5 root       5 -10  0.0
>> 0:00.00  0.0     0    0    0 S kblockd/0
>>                           6 root      15   0  0.0   0:00.00  0.0     0    0
>>    0 S khubd
>>  35 root      15   0  0.0   0:29.98  0.0     0    0    0 S kapmd
>>                                                      38 root      20   0
>>  0.0   0:00.00  0.0     0    0    0 S pdflush
>>                                  39 root      15   0  0.0   0:16.99  0.0
>> 0    0    0 S pdflush
>>
>>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> ________________________________________________________
> To unsubscribe from this list, go to: http://sourceforge.net/mail/?group_id=1365
>


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
________________________________________________________
To unsubscribe from this list, go to: http://sourceforge.net/mail/?group_id=1365


Re: Misterhouse and high CPU usage

by Tom-172 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Gregg Liming wrote:

> Hi Tom,
>
> Tom wrote:
>  
>> Any ideas?
>>    
>
> ... stuff to consider ...
>
> 1) mh may spike CPU if it needs to "catch up" from being starved by
> anything else running at a higher priority.  To rule this out, consider
> running the command renice to bump up the nice level:
>
> renice +5 -u mh
>
> Here, the "-u mh" applies a increase to the nice level for all processes
> owned by mh.  For me, this is easier than collecting pids from files.
> But, if running as root, then you'd have to supply a pid unless you want
> all processes running as root to be similarly applied.
>
> I would definitely do this if I had other stuff running on the same
> system that should be considered lower priority.  Be careful that you
> don't "overnice" mh and cause key OS processes to be starved.
>
>  
The machine only runs MH.. It doesn't even have any X installed.
> 3) I can't quite tell from your discussion below whether the "spiking"
> is occasional or frequent.  Also, is it short-lived or significant
> duration?  Perhaps a better way to get some hint of history is to post
> your pause log from a restart up to a condition of frequent/recurring
> CPU spikes so that we can tell if the condition is linearly worsening or
> seemingly spontaneous.
>  
I think I'm on to something..  It seems to start with something at midnight.
Here's a grep of my log for Paused..
I had restarted MH the previous day and noticed my first Paused at
12:31:00 AM  At first the interval is 15 minutes but as time goes on it
gets to a minute between each pause

06/02/08 12:31:00 AM Paused for 3 seconds
06/02/08 12:46:00 AM Paused for 3 seconds
06/02/08 01:01:00 AM Paused for 3 seconds
06/02/08 01:16:00 AM Paused for 3 seconds
06/02/08 01:31:00 AM Paused for 4 seconds
06/02/08 01:46:00 AM Paused for 3 seconds
06/02/08 02:01:00 AM Paused for 5 seconds
06/02/08 02:16:00 AM Paused for 3 seconds
06/02/08 02:31:00 AM Paused for 5 seconds
06/02/08 02:46:00 AM Paused for 4 seconds
06/02/08 03:00:00 AM Paused for 3 seconds
06/02/08 03:01:00 AM Paused for 6 seconds
06/02/08 03:16:00 AM Paused for 4 seconds
06/02/08 03:31:00 AM Paused for 6 seconds
06/02/08 03:46:00 AM Paused for 4 seconds
06/02/08 04:00:00 AM Paused for 3 seconds
06/02/08 04:01:00 AM Paused for 7 seconds
06/02/08 04:16:00 AM Paused for 4 seconds
06/02/08 04:31:00 AM Paused for 7 seconds
06/02/08 04:46:00 AM Paused for 5 seconds
06/02/08 05:00:00 AM Paused for 3 seconds
06/02/08 05:01:00 AM Paused for 8 seconds
06/02/08 05:15:00 AM Paused for 3 seconds
06/02/08 05:16:00 AM Paused for 5 seconds
06/02/08 05:30:00 AM Paused for 3 seconds
06/02/08 05:31:00 AM Paused for 8 seconds
06/02/08 05:34:00 AM Paused for 3 seconds
06/02/08 05:40:00 AM Paused for 3 seconds
06/02/08 05:41:00 AM Paused for 3 seconds
06/02/08 05:44:00 AM Paused for 3 seconds
06/02/08 05:45:00 AM Paused for 3 seconds
06/02/08 05:46:00 AM Paused for 5 seconds
06/02/08 05:48:00 AM Paused for 3 seconds
06/02/08 05:49:00 AM Paused for 3 seconds
06/02/08 05:50:00 AM Paused for 3 seconds
> 4) Do you have open browsers to mh's ia5 interface when the spiking is
> occurring?  While debugging the problem, make sure that they are all
> closed/killed.
>  
I have an Audrey which is always on the MH page.. It's been like this
for the last few years.. I'll try restarting MH and turning off the
Audrey tonight.
> 5) Consider backing out (disabling) various modules that may be less
> essential.  If you're running/using any of the rrd
> functionality--particularly graphing--considering disabling those first.
>
>  
weather_rrd_upd takes up a lot of the CPU
25543 root      25   0 82.7   0:08.51  1.0 10688 5384 2472 R weather_rrd_upd

If I disable that, I know the graphs will not get updated, but will all
the other weather data be lost for the time that it's disabled?

> 6) You mention a correlation between pausing and saving object states.
> Try watching the size of the saved_states files to see if they are
> fluctuating between the point right after restart to whenever you start
> seeing pausing:
>
> ls -l <mh_data_dir>/mh_temp.saved_states*
>
> Perhaps something from the above may give you some new insight/info that
> could lead to a diagnosis.
>  
55554 Jun  3     mh_temp.saved_states
1025 Jun  3       mh_temp.saved_states.persistent
131244 Jun  1   mh_temp.saved_states.unused

The sizes stay the same after a restart

thanks,
tom

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
________________________________________________________
To unsubscribe from this list, go to: http://sourceforge.net/mail/?group_id=1365


Re: Misterhouse and high CPU usage

by Gregg Liming :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Tom,

Tom wrote:

> I think I'm on to something..  It seems to start with something at midnight.
> Here's a grep of my log for Paused..
> I had restarted MH the previous day and noticed my first Paused at
> 12:31:00 AM  At first the interval is 15 minutes but as time goes on it
> gets to a minute between each pause
>
> 06/02/08 12:31:00 AM Paused for 3 seconds
> 06/02/08 12:46:00 AM Paused for 3 seconds
> 06/02/08 01:01:00 AM Paused for 3 seconds
> 06/02/08 01:16:00 AM Paused for 3 seconds
> 06/02/08 01:31:00 AM Paused for 4 seconds
> 06/02/08 01:46:00 AM Paused for 3 seconds
> 06/02/08 02:01:00 AM Paused for 5 seconds
> 06/02/08 02:16:00 AM Paused for 3 seconds
> 06/02/08 02:31:00 AM Paused for 5 seconds
> 06/02/08 02:46:00 AM Paused for 4 seconds
> 06/02/08 03:00:00 AM Paused for 3 seconds
> 06/02/08 03:01:00 AM Paused for 6 seconds
> 06/02/08 03:16:00 AM Paused for 4 seconds
> 06/02/08 03:31:00 AM Paused for 6 seconds
> 06/02/08 03:46:00 AM Paused for 4 seconds
> 06/02/08 04:00:00 AM Paused for 3 seconds
> 06/02/08 04:01:00 AM Paused for 7 seconds
> 06/02/08 04:16:00 AM Paused for 4 seconds
> 06/02/08 04:31:00 AM Paused for 7 seconds
> 06/02/08 04:46:00 AM Paused for 5 seconds
> 06/02/08 05:00:00 AM Paused for 3 seconds
> 06/02/08 05:01:00 AM Paused for 8 seconds
> 06/02/08 05:15:00 AM Paused for 3 seconds
> 06/02/08 05:16:00 AM Paused for 5 seconds
> 06/02/08 05:30:00 AM Paused for 3 seconds
> 06/02/08 05:31:00 AM Paused for 8 seconds
> 06/02/08 05:34:00 AM Paused for 3 seconds
> 06/02/08 05:40:00 AM Paused for 3 seconds
> 06/02/08 05:41:00 AM Paused for 3 seconds
> 06/02/08 05:44:00 AM Paused for 3 seconds
> 06/02/08 05:45:00 AM Paused for 3 seconds
> 06/02/08 05:46:00 AM Paused for 5 seconds
> 06/02/08 05:48:00 AM Paused for 3 seconds
> 06/02/08 05:49:00 AM Paused for 3 seconds
> 06/02/08 05:50:00 AM Paused for 3 seconds

I would agree that this pattern looks interesting.  If it were me, I'd
try restarting again to see if the same pattern roughly occurs.  If it
is almost exact, then that should tell you something.  I'd probably hold
off disabling things for the time being.

I find it very interesting that the time-stamp always shows 00 seconds.
  I went through all of my past pause logs and noticed that never
occurred.  And, although I do know what caused my pauses (specifically,
tailing an ever lengthening log file), the time stamp corresponded to
the number of paused seconds.  For example:

mm/dd/yy hh:mm:03 xM Paused for 3 seconds
mm/dd/yy hh:mm:08 xM Paused for 8 seconds

I'll try to mull on this a bit more to see if I can explain why yours is
different.  Just out of curiosity, are you syncing your computers time?
  If so, how (rdate, ntp, mh's internet time client)?  If not, do you
know if your clock tends to drift?  FWIW: I've occasionally noticed bad
behavior for applications that are very time sensitive and a system
clock that is allowed to drift.  For this reason, I always use ntp/ntpd.

I'd probably start looking for possible culprits to things that may be
going on that are time based.  Consider grep-ing on time_ (or time_now,
time_cron, etc. to be more exact) w/i your code dir to get insights.

Gregg

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
________________________________________________________
To unsubscribe from this list, go to: http://sourceforge.net/mail/?group_id=1365


Re: Misterhouse and high CPU usage

by Tom-172 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Gregg Liming wrote:
> I would agree that this pattern looks interesting.  If it were me, I'd
> try restarting again to see if the same pattern roughly occurs.  If it
> is almost exact, then that should tell you something.  I'd probably hold
> off disabling things for the time being.
>
>  
The same pattern does occur and it starts about 3-4 hours after
restarting.  I've noted restart times and the first occurrence of
"Paused for..."

started at 6/3/08 11:55:55 AM
06/03/08 04:01:00 PM Paused for 3 seconds
 
started at 6/4/08 11:29:39 AM
06/04/08 03:01:00 PM Paused for 3 seconds
 
started at 06/04/08 03:58:54PM
06/04/08 08:01:00 PM Paused for 3 seconds


> Just out of curiosity, are you syncing your computers time?
> .... For this reason, I always use ntp/ntpd.
>  
I also use ntpd
> I'd probably start looking for possible culprits to things that may be
> going on that are time based.  Consider grep-ing on time_ (or time_now,
> time_cron, etc. to be more exact) w/i your code dir to get insights.
>  
I saw some code using time_now and time_cron, but they aren't new code..
> Gregg
>  

thanks,
tom

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
________________________________________________________
To unsubscribe from this list, go to: http://sourceforge.net/mail/?group_id=1365


Re: Misterhouse and high CPU usage

by Tom-172 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm still stuck..  The patterns I've been seeing still continues every
3-4 hours after restart.

does this give any clues? it's part of monitor_memory_leak.log

Fri 06/13/08 14:45:56 -- Restarted --.  Perl version:  5.008005
Fri 06/13/08 14:47:05 -- Restarted --.  Perl version:  5.008005
Fri 06/13/08 15:50:00   1.0 hours:   1.1 MB in  0.9 hours.   38.7 ->  
39.8 at  1.26 MB/hour.  Total:  1.06 MB/hour
Fri 06/13/08 17:20:00   2.5 hours:   0.5 MB in  1.5 hours.   39.8 ->  
40.3 at  0.36 MB/hour.  Total:  0.65 MB/hour
Fri 06/13/08 19:10:00   4.4 hours:   0.6 MB in  1.8 hours.   40.3 ->  
40.9 at  0.30 MB/hour.  Total:  0.50 MB/hour
Fri 06/13/08 19:50:00   5.0 hours:   9.2 MB in  0.7 hours.   40.9 ->  
50.1 at 13.85 MB/hour.  Total:  2.27 MB/hour
Fri 06/13/08 22:40:20 -- Restarted --.  Perl version:  5.008005
Sat 06/14/08 00:20:00   1.7 hours:   0.5 MB in  1.5 hours.   39.3 ->  
39.8 at  0.34 MB/hour.  Total:  0.30 MB/hour
Sat 06/14/08 01:50:00   3.2 hours:   0.5 MB in  1.5 hours.   39.8 ->  
40.3 at  0.34 MB/hour.  Total:  0.32 MB/hour
Sat 06/14/08 03:20:00   4.7 hours:   0.5 MB in  1.5 hours.   40.3 ->  
40.9 at  0.36 MB/hour.  Total:  0.34 MB/hour
Sat 06/14/08 05:20:00   6.7 hours:   0.5 MB in  2.0 hours.   40.9 ->  
41.4 at  0.25 MB/hour.  Total:  0.31 MB/hour
Sat 06/14/08 07:10:00   8.5 hours:   0.5 MB in  1.8 hours.   41.4 ->  
41.9 at  0.30 MB/hour.  Total:  0.31 MB/hour
Sat 06/14/08 09:10:00  10.5 hours:   0.6 MB in  2.0 hours.   41.9 ->  
42.6 at  0.32 MB/hour.  Total:  0.31 MB/hour
Sat 06/14/08 09:30:00  10.8 hours:   1.0 MB in  0.3 hours.   42.6 ->  
43.6 at  3.01 MB/hour.  Total:  0.39 MB/hour
Sat 06/14/08 12:10:00  13.5 hours:   6.0 MB in  2.7 hours.   43.6 ->  
49.5 at  2.23 MB/hour.  Total:  0.76 MB/hour
Sat 06/14/08 14:10:00  15.5 hours:   0.5 MB in  2.0 hours.   49.5 ->  
50.1 at  0.27 MB/hour.  Total:  0.69 MB/hour
Sat 06/14/08 15:40:00  17.0 hours:   0.5 MB in  1.5 hours.   50.1 ->  
50.6 at  0.35 MB/hour.  Total:  0.66 MB/hour
Sat 06/14/08 16:00:00  17.3 hours:   3.1 MB in  0.3 hours.   50.6 ->  
53.7 at  9.30 MB/hour.  Total:  0.83 MB/hour
Sun 06/15/08 01:50:00  27.2 hours:   0.5 MB in  9.8 hours.   53.7 ->  
54.2 at  0.05 MB/hour.  Total:  0.55 MB/hour
Sun 06/15/08 03:20:00  28.7 hours:   0.5 MB in  1.5 hours.   54.2 ->  
54.7 at  0.35 MB/hour.  Total:  0.54 MB/hour
Sun 06/15/08 05:10:00  30.5 hours:   0.6 MB in  1.8 hours.   54.7 ->  
55.3 at  0.31 MB/hour.  Total:  0.52 MB/hour
Sun 06/15/08 06:50:00  32.2 hours:   0.5 MB in  1.7 hours.   55.3 ->  
55.8 at  0.30 MB/hour.  Total:  0.51 MB/hour
Sun 06/15/08 08:10:00  33.5 hours:   0.7 MB in  1.3 hours.   55.8 ->  
56.4 at  0.49 MB/hour.  Total:  0.51 MB/hour
Sun 06/15/08 09:50:00  35.2 hours:   0.5 MB in  1.7 hours.   56.4 ->  
56.9 at  0.30 MB/hour.  Total:  0.50 MB/hour
Sun 06/15/08 12:10:00  37.5 hours:   0.8 MB in  2.3 hours.   56.9 ->  
57.7 at  0.33 MB/hour.  Total:  0.49 MB/hour
Sun 06/15/08 14:10:00  39.5 hours:   0.6 MB in  2.0 hours.   57.7 ->  
58.3 at  0.31 MB/hour.  Total:  0.48 MB/hour
Sun 06/15/08 16:10:00  41.5 hours:   0.6 MB in  2.0 hours.   58.3 ->  
59.0 at  0.32 MB/hour.  Total:  0.47 MB/hour
Sun 06/15/08 17:50:00  43.2 hours:   0.5 MB in  1.7 hours.   59.0 ->  
59.5 at  0.30 MB/hour.  Total:  0.47 MB/hour
Sun 06/15/08 19:20:00  44.7 hours:   0.5 MB in  1.5 hours.   59.5 ->  
60.0 at  0.36 MB/hour.  Total:  0.46 MB/hour
Sun 06/15/08 19:28:29 -- Restarted --.  Perl version:  5.008005
Sun 06/15/08 20:10:00   0.7 hours:   0.7 MB in  0.5 hours.   37.5 ->  
38.3 at  1.42 MB/hour.  Total:  1.07 MB/hour
Sun 06/15/08 21:50:00   2.4 hours:   0.5 MB in  1.7 hours.   38.3 ->  
38.8 at  0.32 MB/hour.  Total:  0.54 MB/hour
Sun 06/15/08 23:20:00   3.9 hours:   0.5 MB in  1.5 hours.   38.8 ->  
39.3 at  0.35 MB/hour.  Total:  0.47 MB/hour
Mon 06/16/08 01:40:00   6.2 hours:   0.5 MB in  2.3 hours.   39.3 ->  
39.9 at  0.23 MB/hour.  Total:  0.38 MB/hour
Mon 06/16/08 03:10:00   7.7 hours:   0.5 MB in  1.5 hours.   39.9 ->  
40.4 at  0.34 MB/hour.  Total:  0.37 MB/hour
Mon 06/16/08 04:50:00   9.4 hours:   0.5 MB in  1.7 hours.   40.4 ->  
40.9 at  0.31 MB/hour.  Total:  0.36 MB/hour
Mon 06/16/08 06:20:00  10.9 hours:   0.5 MB in  1.5 hours.   40.9 ->  
41.4 at  0.34 MB/hour.  Total:  0.36 MB/hour
Mon 06/16/08 08:10:00  12.7 hours:   0.6 MB in  1.8 hours.   41.4 ->  
42.0 at  0.32 MB/hour.  Total:  0.35 MB/hour
Mon 06/16/08 08:50:00  13.4 hours:   1.0 MB in  0.7 hours.   42.0 ->  
43.0 at  1.49 MB/hour.  Total:  0.41 MB/hour
Mon 06/16/08 11:10:00  15.7 hours:   0.6 MB in  2.3 hours.   43.0 ->  
43.6 at  0.26 MB/hour.  Total:  0.39 MB/hour
Mon 06/16/08 13:20:00  17.9 hours:   8.9 MB in  2.2 hours.   43.6 ->  
52.5 at  4.13 MB/hour.  Total:  0.84 MB/hour
Mon 06/16/08 23:10:00  27.7 hours:   0.6 MB in  9.8 hours.   52.5 ->  
53.1 at  0.06 MB/hour.  Total:  0.56 MB/hour
Tue 06/17/08 00:50:00  29.4 hours:   0.5 MB in  1.7 hours.   53.1 ->  
53.7 at  0.31 MB/hour.  Total:  0.55 MB/hour
Tue 06/17/08 03:10:00  31.7 hours:   0.6 MB in  2.3 hours.   53.7 ->  
54.3 at  0.26 MB/hour.  Total:  0.53 MB/hour
Tue 06/17/08 04:50:00  33.4 hours:   0.5 MB in  1.7 hours.   54.3 ->  
54.8 at  0.30 MB/hour.  Total:  0.52 MB/hour
Tue 06/17/08 06:20:00  34.9 hours:   0.5 MB in  1.5 hours.   54.8 ->  
55.3 at  0.35 MB/hour.  Total:  0.51 MB/hour
Tue 06/17/08 08:10:00  36.7 hours:   0.6 MB in  1.8 hours.   55.3 ->  
55.8 at  0.31 MB/hour.  Total:  0.50 MB/hour
Tue 06/17/08 09:00:00  37.5 hours:   0.5 MB in  0.8 hours.   55.8 ->  
56.4 at  0.60 MB/hour.  Total:  0.50 MB/hour
Tue 06/17/08 11:10:00  39.7 hours:   0.6 MB in  2.2 hours.   56.4 ->  
56.9 at  0.27 MB/hour.  Total:  0.49 MB/hour
Tom wrote:

> Gregg Liming wrote:
>> I would agree that this pattern looks interesting.  If it were me,
>> I'd try restarting again to see if the same pattern roughly occurs.  
>> If it is almost exact, then that should tell you something.  I'd
>> probably hold off disabling things for the time being.
>>
>>  
> The same pattern does occur and it starts about 3-4 hours after
> restarting.  I've noted restart times and the first occurrence of
> "Paused for..."
>
> started at 6/3/08 11:55:55 AM
> 06/03/08 04:01:00 PM Paused for 3 seconds
>
> started at 6/4/08 11:29:39 AM
> 06/04/08 03:01:00 PM Paused for 3 seconds
>
> started at 06/04/08 03:58:54PM
> 06/04/08 08:01:00 PM Paused for 3 seconds
>
>
>> Just out of curiosity, are you syncing your computers time? .... For
>> this reason, I always use ntp/ntpd.
>>  
> I also use ntpd
>> I'd probably start looking for possible culprits to things that may
>> be going on that are time based.  Consider grep-ing on time_ (or
>> time_now, time_cron, etc. to be more exact) w/i your code dir to get
>> insights.
>>  
> I saw some code using time_now and time_cron, but they aren't new code..
>> Gregg
>>  
>
> thanks,
> tom


-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
________________________________________________________
To unsubscribe from this list, go to: http://sourceforge.net/mail/?group_id=1365


Re: Misterhouse and high CPU usage

by Gregg Liming :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Tom wrote:
> I'm still stuck..  The patterns I've been seeing still continues every
> 3-4 hours after restart.
>
> does this give any clues? it's part of monitor_memory_leak.log

Wow--that's quite a lot of memory leaking!  FYI - the memory monitor
runs every 10 minutes--hence the slight pattern of minutes.
Unfortunately, it doesn't immediately suggest anything in the way of a
pattern (except that it's far too high).  Can you either post or send me
off-list the contents of your data/code_select.txt ?

Gregg

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
________________________________________________________
To unsubscribe from this list, go to: http://sourceforge.net/mail/?group_id=1365


Re: Misterhouse and high CPU usage

by Tom-172 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


> Wow--that's quite a lot of memory leaking!  FYI - the memory monitor
> runs every 10 minutes--hence the slight pattern of minutes.
> Unfortunately, it doesn't immediately suggest anything in the way of a
> pattern (except that it's far too high).  Can you either post or send me
> off-list the contents of your data/code_select.txt ?
>
>  
I thought the leak monitor would help pin down which code it causing
problems.. It may be something in my code..is there any way to figure
out which code (except for the obvious disabling and enabling one at a
time.)

Here's my code_select.txt   It doesn't show my own code..should it?

audrey_control.pl
audreyspeak.pl
benchmarks.pl
callerid.pl
comic_dailystrips.pl
eliza_server.pl
insteon_item_commands.pl
menu.pl
mh_control.pl
mh_release.pl
mh_sound.pl
mhsend_server.pl
monitor_memory.pl
organizer.pl
phone.pl
phone_logs.pl
photo_index.pl
test_xap.pl
tv_grid.pl
weather_aws.pl
weather_rrd_update.pl
x10_item_commands.pl
x10_reset.pl
xAP_command.pl
xAP_send.pl


thanks,
tom

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
________________________________________________________
To unsubscribe from this list, go to: http://sourceforge.net/mail/?group_id=1365


Re: Misterhouse and high CPU usage

by Gregg Liming :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Tom wrote:

> I thought the leak monitor would help pin down which code it causing
> problems..

Well, it may be symptomatic; but, the log itself isn't helpful to my eyes.

> It may be something in my code..is there any way to figure
> out which code (except for the obvious disabling and enabling one at a
> time.)

No.  I will offer to review it as I get the time if you want.  If
inclined, tar/gzip it up and send it off-list.

> Here's my code_select.txt   It doesn't show my own code..should it?

no.  I'll remove the entries that I have enabled as I'm not hemorrhaging
memory nor having the other problems that you are.  That doesn't mean
that what is left is a problem; but might be things to consider.  Maybe
other users that are fully asymptomatic (both to your pause rate and
memory leak rate) can rule out more.  Admittedly, I'd be more suspect of
user code.

> audrey_control.pl
> audreyspeak.pl
> callerid.pl
> comic_dailystrips.pl
> mh_release.pl
> test_xap.pl
> tv_grid.pl
> weather_aws.pl
> x10_reset.pl

BTW: first on my list to ditch permanently would be text_xap.pl.

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
________________________________________________________
To unsubscribe from this list, go to: http://sourceforge.net/mail/?group_id=1365


Re: Misterhouse and high CPU usage

by Gregg Liming :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Gregg Liming wrote:
> Tom wrote:

>> Here's my code_select.txt   It doesn't show my own code..should it?
>
> no.  I'll remove the entries that I have enabled as I'm not hemorrhaging
> memory nor having the other problems that you are.  That doesn't mean
> that what is left is a problem; but might be things to consider.

Still, though, I was leaking some memory in a very noticeable pattern.
And, leaking any--IMO--is not good.  After some semi-educated guess-work
and subsequent experiments, I narrowed it down to
code/common/organizer.pl.  I'm currently working on a fix as I believe I
know where the problem lies.  If correct, then you should see an ever
increasing memory leak that worsens semi-linearly as a function of the
size of your_data_dir/organizer/calendar.tab.  It seems to occur because
the tab file is being overzealously overwritten by ical2vsdb--which I'll
separately need to investigate (possibly some condition is triggering it
specific to my case).  My tab file had just become noticeable w/ 368
entries.  I'll post back once I can confirm having resolved the problem.

In the mean-time, if you can easily live w/o organizer.pl, you might try
disabling it and continuing to monitor your memory leak patterns.  If
they go away, then you're similarly affected.  If you are a ical2vsdb
user and you really need organizer.pl, you could try increasing
ical_read_interval (at least to 15 and possibly more--like 30 or 60).

Gregg

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
________________________________________________________
To unsubscribe from this list, go to: http://sourceforge.net/mail/?group_id=1365


Re: Misterhouse and high CPU usage

by Tom-172 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Gregg Liming wrote:
> Gregg Liming wrote:
>  
> In the mean-time, if you can easily live w/o organizer.pl, you might try
> disabling it and continuing to monitor your memory leak patterns.  If
> they go away, then you're similarly affected.  If you are a ical2vsdb
> user and you really need organizer.pl, you could try increasing
> ical_read_interval (at least to 15 and possibly more--like 30 or 60).
>  
That seems to be related to what I'm seeing as well..

I changed my read interval to 90 and so far (6 hours after a restart) I
have not had any pauses (I was getting one starting about 3-4 hours
after a restart)

Here's my info from monitor_memory_leak.log

Thu 06/19/08 08:55:09 -- Restarted --.  Perl version:  5.008005
Thu 06/19/08 11:10:00   2.2 hours:   1.0 MB in  2.1 hours.   39.6 ->  
40.5 at  0.47 MB/hour.  Total:  0.43 MB/hour
Thu 06/19/08 13:10:00   4.2 hours:   8.5 MB in  2.0 hours.   40.5 ->  
49.0 at  4.23 MB/hour.  Total:  2.22 MB/hour


thanks for figuring this out.. it was driving me nuts :-)

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
________________________________________________________
To unsubscribe from this list, go to: http://sourceforge.net/mail/?group_id=1365


Re: Misterhouse and high CPU usage

by Gregg Liming :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Tom wrote:

> Gregg Liming wrote:
>> Gregg Liming wrote:
>>  
>> In the mean-time, if you can easily live w/o organizer.pl, you might try
>> disabling it and continuing to monitor your memory leak patterns.  If
>> they go away, then you're similarly affected.  If you are a ical2vsdb
>> user and you really need organizer.pl, you could try increasing
>> ical_read_interval (at least to 15 and possibly more--like 30 or 60).
>>  
> That seems to be related to what I'm seeing as well..
>
> I changed my read interval to 90 and so far (6 hours after a restart) I
> have not had any pauses (I was getting one starting about 3-4 hours
> after a restart)
>
> Here's my info from monitor_memory_leak.log
>
> Thu 06/19/08 08:55:09 -- Restarted --.  Perl version:  5.008005
> Thu 06/19/08 11:10:00   2.2 hours:   1.0 MB in  2.1 hours.   39.6 ->  
> 40.5 at  0.47 MB/hour.  Total:  0.43 MB/hour
> Thu 06/19/08 13:10:00   4.2 hours:   8.5 MB in  2.0 hours.   40.5 ->  
> 49.0 at  4.23 MB/hour.  Total:  2.22 MB/hour
>
>
> thanks for figuring this out.. it was driving me nuts :-)

The fix that I've made to code/common/organizer.pl seems to have
significantly reduced recurring memory leaks; so, I've just committed
it.  If you get a chance, would you mind giving it a try?

Some comments:

1) organizer.pl will load a rather large array array of hashes that
correspond directly to the contents of the vsDB "database" that holds
your calendar.
2) the load only occurs after it sense that the vsDB has changed.
Changes can either be manual--through the mh web interface--or
"automated" via ical2vsdb.
3) The ical_read_interval will cause the ical2vsdb daemon to be run
ical_read_interval minutes after startup; if something has changed, then
the large array of hashes is created.  This can be so large that it
"looks" like a sudden memory leak.  This is ok so long as it only
happens once.  The problem that had been occurring is that it kept on
adding incrementally the massive array of hashes so that it was
accrueing them rather than starting over.  So, if you have a reasonably
large set of events in your calendar DB and/or your vsDB might be
getting regularly "tweaked" from ical2vsdb (sometimes it seems as though
the updates are occurring when they shouldn't be; but, that's a
different problem).  This is obviously a very huge compounding problem
because it not only starts chewing up more memory, but it also causes
the evaluation in organizer.pl to become exponentially worse because of
the huge number of iterations each new time.
4) From the above, it should be clear that there are a number of
differing conditions that collectively can contribute to seeing a
noticeable memory leak and/or it getting to a point that is causing real
problem.  However.......

This memory leak has existed at least as long as 2.104 and perhaps
slightly sooner.  If you enable code/common/organizer.pl and y