osync_engine_finalize never exits

View: New views
4 Messages — Rating Filter:   Alert me  

osync_engine_finalize never exits

by Irene Huang :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

hi, all
       
        I am using the test binary engine in opensync and found that the
        test
        "engine_sync" never ends in osync_engine_finalize.
       
        The main thread sleeps  at
       
          [1] __lwp_park(0x0, 0x0), at 0xd24b0c99
          [2] cond_sleep_queue(0x809a590, 0x80926c8, 0x0), at
        0xd24aace1
          [3] cond_wait_queue(0x809a590, 0x80926c8, 0x0), at 0xd24aae43
          [4] cond_wait_common(0x809a590, 0x80926c8, 0x0), at
        0xd24ab19d
          [5] __cond_wait(0x809a590, 0x80926c8), at 0xd24ab309
          [6] _cond_wait(0x809a590, 0x80926c8), at 0xd24ab335
          [7] _pthread_cond_wait(0x809a590, 0x80926c8), at 0xd24ab36f
          [8] g_async_queue_pop_intern_unlocked(0x8092708, 0x0, 0x0), at
        0xd29eb1d8
          [9] g_async_queue_pop(0x8092708), at 0xd29eb293
        =>[10] osync_queue_get_message(queue = 0x809b900), line 1044 in
        "opensync_queue.c"
          [11] osync_client_proxy_shutdown(proxy = 0x8095508, error =
        0x80478d0), line 942 in "opensync_client_proxy.c"
          [12] _osync_engine_finalize_member(engine = 0x80a2040, proxy =
        0x8095508, error = 0x80478d0), line 584 in "opensync_engine.c"
          [13] osync_engine_finalize(engine = 0x80a2040, error =
        0x80478d0),
        line 1141 in "opensync_engine.c"
          [14] engine_sync(_i = 0), line 329 in "check_engine.c"
          [15] srunner_iterate_tcase_tfuns(0x8080330, 0x80801f0), at
        0xd2564c6e
          [16] srunner_iterate_suites(0x8080330, 0x3), at 0xd2564928
          [17] srunner_run_all(0x8080330, 0x3), at 0xd2564a9d
          [18] main(), line 1945 in "check_engine.c"
       
        others are either
        ===========
        current thread: t@27
          [1] __pollsys(0xd15d8e90, 0x1, 0xd15d8e58, 0x0), at
        0xd24b4cd5
          [2] _pollsys(0xd15d8e90, 0x1, 0xd15d8e58, 0x0), at 0xd24a440c
          [3] _poll(0xd15d8e90, 0x1, 0x64), at 0xd2465eb2
        =>[4] osync_queue_poll(queue = 0x8092df0), line 1017 in
        "opensync_queue.c"
          [5] _source_check(source = 0x809c348), line 467 in
        "opensync_queue.c"
          [6] g_main_context_check(0x8092e50, 0x7fffffff, 0x808ef08,
        0x1), at
        0xd2a0f64b
          [7] g_main_context_iterate(0x8092e50, 0x1, 0x1, 0x8085e80), at
        0xd2a0fcc9
          [8] g_main_loop_run(0x808e050), at 0xd2a102dc
          [9] g_thread_create_proxy(0x8085e80), at 0xd2a3462c
          [10] _thr_setup(0xd1fd1a00), at 0xd24b09e2
          [11] _lwp_start(), at 0xd24b0c40
        ==================
        OR
       
        ==================
        =>[1] __pollsys(0x80832c8, 0x1, 0xd1ecff08, 0x0), at 0xd24b4cd5
          [2] _pollsys(0x80832c8, 0x1, 0xd1ecff08, 0x0), at 0xd24a440c
          [3] _poll(0x80832c8, 0x1, 0x1), at 0xd2465eb2
          [4] g_main_context_iterate(0x8084658, 0x1, 0x1, 0x80932d0), at
        0xd2a0fca3
          [5] g_main_loop_run(0x8080ed8), at 0xd2a102dc
          [6] g_thread_create_proxy(0x80932d0), at 0xd2a3462c
          [7] _thr_setup(0xd1fd1200), at 0xd24b09e2
          [8] _lwp_start(), at 0xd24b0c40
        ===================
       
        Has anyone ever seen this problem?
        Does anyone know how to fix it?
       
        Thanks in advance.
       
        --Irene
       
       


-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Opensync-devel mailing list
Opensync-devel@...
https://lists.sourceforge.net/lists/listinfo/opensync-devel

Re: osync_engine_finalize never exits

by Daniel Gollub :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thursday 17 April 2008 23:24:03 Irene Huang wrote:
>         I am using the test binary engine in opensync and found that the
>         test
>         "engine_sync" never ends in osync_engine_finalize.
>        
>         The main thread sleeps  at
>        
[...]
>        
>         Has anyone ever seen this problem?
>         Does anyone know how to fix it?

Uhh... this looks like the client recieves never the HUP signal from the
client_proxy. See the commend in osync_client_proxy_shutdown() at the
osync_queue_disconnect() call.

Could you check if osync_queue_disconnect() really calls close(queue->fd) -
this calls SHOULD emit a HUP signal on the other end. By the other end I mean
the "client", which is acutally the plugin thread (or process) - it's the 2nd
backtrace. Instead of shutting down it keeps polling on the pipe/queue:

      =>[4] osync_queue_poll(queue = 0x8092df0), line 1017 in
        "opensync_queue.c"
          [5] _source_check(source = 0x809c348), line 467 in
        "opensync_queue.c"
          [6] g_main_context_check(0x8092e50, 0x7fffffff, 0x808ef08,
        0x1), at
        0xd2a0f64b

Actually the client should get this HUP signal (while polling) and shutting
down. So the client_proxy gets as well a HUP signal, which singals that the
client is gone. (See osync_client_proxy() and the comment above
osync_queue_get_message() - it's also hanging right here.)

You might instrument osync_queue_poll() with more osync_traces(). Could you
check your traces file if this appears:
osync_trace(TRACE_ERROR, "queue poll failed - system error :%i %s", errno,
strerror(errno));

To clarify what actually i'm talking about. We have an IPC, you can find the
reason about this and some more outdated stuff at
http://opensync.org/wiki/IPC#WhyIPC - it's really outdated! - just check "Why
IPC?" ;)

The "big" pictue looks like this:

OSyncClientProxy       <->        OSyncClient
(Framework/Engine)            (Plugin-Thread/Process)


I fear there might be some porting problems with our IPC implementation. It's
based on all kinds of pipes (depending if the plugins is started in a thread
or in a seperated process or is an external already running process). All
this is done within OSyncQueue. I started also describing this briefly in the
new OpenSync Whitepaper ./docs/whitepaper/ - check chapter IPC.
http://cryptomilch.de/~dgollub/OpenSync/OpenSync-0.40-DRAFT-20080118.pdf 
(maybe the source in SVN is more up to date...)

Regarding http://opensync.org/testing/testDetails.php?test=633&build=221 even
on you testing host the IPC unit is failing. We should start debugging there.
The engine test is VERY complex. I guess if we once fixed the IPC unit, we
get rid of your initial problem. Looking at the test results the ipc_pipes_*
most of the tests PASS beside ipc_timeout. Not quite sure which type of
OSyncQueue we're looking at right now. Could you dump the content of the
structure? I'm mostly interesed in "fd" and "name" of the structs members...

(Don't hesitate to ask if there is something unclear... the mail is might be
confusing - just started with coffee #1)

best regards,
Daniel

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Opensync-devel mailing list
Opensync-devel@...
https://lists.sourceforge.net/lists/listinfo/opensync-devel

Re: osync_engine_finalize never exits

by Irene Huang :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi, Daniel

I just had some investigation into the ipc_callback_break test case
timeout issues on Solaris. (This is very similar to the
osync_engine_finalize problem I mentioned in the previous email). With
regard to some of your questions, here're my findings


1. Yes, it's true that the client_queue never receive a HUP signal.
In ipc_callback_break the after "stop_after" times of messages, the
client will disconnect itself, which signals the server a HUP, then the
server disconnects.
However, the client_queue never receives the expected HUP Message.

The flow looks like
Client(RECEIVER) disconnects --HUP--> Server(SENDER) gets HUP and
disconnects
--xx-HUP-xx--> Client (SENDER) does not disconnect----> Server
(RECEIVER) does not disconnect

In the ipc_callback_break, I think that the sever equals the clientproxy
that you talk about in the previous email, correct me if I am wrong.

My question is: who should send out the HUP message to be received  by
message = osync_queue_get_message(client_queue);
in ipc_callback_break?
My understanding is that when the Server(SENDER) gets HUP and
disconnects, it should send out a HUP to the client_queue is this
correct?
Any hints about where the problem might exist?

2. For the two calls of osync_queue_disconnect, the close (fd) do exit
correctly.

3. No osync_trace(TRACE_ERROR, "queue poll failed - system error :%i %
s", errno, strerror(errno)); Was found in the osync traces.

4. I instrumented  osync_queue_poll() with more osync_trace, compared
with the osync_queue_poll() traces generated on Linux, Solaris doesn't
seem to act differently at osync_queue_poll.

Thanks
--Irene
On Fri, 2008-04-18 at 10:30 +0200, Daniel Gollub wrote:

> On Thursday 17 April 2008 23:24:03 Irene Huang wrote:
> >         I am using the test binary engine in opensync and found that the
> >         test
> >         "engine_sync" never ends in osync_engine_finalize.
> >        
> >         The main thread sleeps  at
> >        
> [...]
> >        
> >         Has anyone ever seen this problem?
> >         Does anyone know how to fix it?
>
> Uhh... this looks like the client recieves never the HUP signal from the
> client_proxy. See the commend in osync_client_proxy_shutdown() at the
> osync_queue_disconnect() call.
>
> Could you check if osync_queue_disconnect() really calls close(queue->fd) -
> this calls SHOULD emit a HUP signal on the other end. By the other end I mean
> the "client", which is acutally the plugin thread (or process) - it's the 2nd
> backtrace. Instead of shutting down it keeps polling on the pipe/queue:
>
>       =>[4] osync_queue_poll(queue = 0x8092df0), line 1017 in
>         "opensync_queue.c"
>           [5] _source_check(source = 0x809c348), line 467 in
>         "opensync_queue.c"
>           [6] g_main_context_check(0x8092e50, 0x7fffffff, 0x808ef08,
>         0x1), at
>         0xd2a0f64b
>
> Actually the client should get this HUP signal (while polling) and shutting
> down. So the client_proxy gets as well a HUP signal, which singals that the
> client is gone. (See osync_client_proxy() and the comment above
> osync_queue_get_message() - it's also hanging right here.)
>
> You might instrument osync_queue_poll() with more osync_traces(). Could you
> check your traces file if this appears:
> osync_trace(TRACE_ERROR, "queue poll failed - system error :%i %s", errno,
> strerror(errno));
>
> To clarify what actually i'm talking about. We have an IPC, you can find the
> reason about this and some more outdated stuff at
> http://opensync.org/wiki/IPC#WhyIPC - it's really outdated! - just check "Why
> IPC?" ;)
>
> The "big" pictue looks like this:
>
> OSyncClientProxy       <->        OSyncClient
> (Framework/Engine)            (Plugin-Thread/Process)
>
>
> I fear there might be some porting problems with our IPC implementation. It's
> based on all kinds of pipes (depending if the plugins is started in a thread
> or in a seperated process or is an external already running process). All
> this is done within OSyncQueue. I started also describing this briefly in the
> new OpenSync Whitepaper ./docs/whitepaper/ - check chapter IPC.
> http://cryptomilch.de/~dgollub/OpenSync/OpenSync-0.40-DRAFT-20080118.pdf 
> (maybe the source in SVN is more up to date...)
>
> Regarding http://opensync.org/testing/testDetails.php?test=633&build=221 even
> on you testing host the IPC unit is failing. We should start debugging there.
> The engine test is VERY complex. I guess if we once fixed the IPC unit, we
> get rid of your initial problem. Looking at the test results the ipc_pipes_*
> most of the tests PASS beside ipc_timeout. Not quite sure which type of
> OSyncQueue we're looking at right now. Could you dump the content of the
> structure? I'm mostly interesed in "fd" and "name" of the structs members...
>
> (Don't hesitate to ask if there is something unclear... the mail is might be
> confusing - just started with coffee #1)
>
> best regards,
> Daniel
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
> Don't miss this year's exciting event. There's still time to save $100.
> Use priority code J8TL2D2.
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
> _______________________________________________
> Opensync-devel mailing list
> Opensync-devel@...
> https://lists.sourceforge.net/lists/listinfo/opensync-devel


-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Opensync-devel mailing list
Opensync-devel@...
https://lists.sourceforge.net/lists/listinfo/opensync-devel

Re: osync_engine_finalize never exits

by Daniel Gollub :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wednesday 23 April 2008 20:15:01 Irene Huang wrote:

> I just had some investigation into the ipc_callback_break test case
> timeout issues on Solaris. (This is very similar to the
> osync_engine_finalize problem I mentioned in the previous email). With
> regard to some of your questions, here're my findings
>
>
> 1. Yes, it's true that the client_queue never receive a HUP signal.
> In ipc_callback_break the after "stop_after" times of messages, the
> client will disconnect itself, which signals the server a HUP, then the
> server disconnects.
> However, the client_queue never receives the expected HUP Message.
>
> The flow looks like
> Client(RECEIVER) disconnects --HUP--> Server(SENDER) gets HUP and
> disconnects
> --xx-HUP-xx--> Client (SENDER) does not disconnect----> Server
> (RECEIVER) does not disconnect
>
> In the ipc_callback_break, I think that the sever equals the clientproxy
> that you talk about in the previous email, correct me if I am wrong.

Correct.

>
> My question is: who should send out the HUP message to be received  by
> message = osync_queue_get_message(client_queue);
> in ipc_callback_break?

When osync_queue_poll() returns with OSYNC_QUEUE_EVENT_HUP in _source_check(),
it will create a OSYNC_MESSAGE_QUEUE_HUP put this as last message on the
queue... this is what osync_queue_get_message() waiting/blocking for.

> My understanding is that when the Server(SENDER) gets HUP and
> disconnects, it should send out a HUP to the client_queue is this
> correct?
Correct.

> Any hints about where the problem might exist?

 Could it be that those problems ONLY appear with queues using FIFOs?

FIFO queues -> osync_queue_new()
PIPE queues -> osync_queue_new_pipes()

We used to have a similar problem with Linux and pipes - not FIFOs. We just
forgot to close the write end pipes. See commit r2525 and r2526.

But i guess there might be differences between FIFOs in linux and Solaris.

http://opensync.org/testing/testDetails.php?test=639&build=226
Regarding this test results ipc_connect() even works - it's using FIFO, but
only one Queue. ipc_payload() and others which fails is using two
(client/server) FIFO based Queues.

Maybe here is again an issue when forking and not closing certain FIFO file
descriptors, which only appears on Solaris, not linux this time.

>
> 2. For the two calls of osync_queue_disconnect, the close (fd) do exit
> correctly.
>/
> 3. No osync_trace(TRACE_ERROR, "queue poll failed - system error :%i %
> s", errno, strerror(errno)); Was found in the osync traces.
>
> 4. I instrumented  osync_queue_poll() with more osync_trace, compared
> with the osync_queue_poll() traces generated on Linux, Solaris doesn't
> seem to act differently at osync_queue_poll.

I guess the behavior of FIFO and polling them does differ on Solaris...
But it seems only in some cases there is no POLLHUP - is this correct?
Could you send me the trace files of one of those IPC test which times out?

best regards,
Daniel

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Opensync-devel mailing list
Opensync-devel@...
https://lists.sourceforge.net/lists/listinfo/opensync-devel
LightInTheBox - Buy quality products at wholesale price