|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
osync_engine_finalize never exitshi, all
I am using the test binary engine in opensync and found that the test "engine_sync" never ends in osync_engine_finalize. The main thread sleeps at [1] __lwp_park(0x0, 0x0), at 0xd24b0c99 [2] cond_sleep_queue(0x809a590, 0x80926c8, 0x0), at 0xd24aace1 [3] cond_wait_queue(0x809a590, 0x80926c8, 0x0), at 0xd24aae43 [4] cond_wait_common(0x809a590, 0x80926c8, 0x0), at 0xd24ab19d [5] __cond_wait(0x809a590, 0x80926c8), at 0xd24ab309 [6] _cond_wait(0x809a590, 0x80926c8), at 0xd24ab335 [7] _pthread_cond_wait(0x809a590, 0x80926c8), at 0xd24ab36f [8] g_async_queue_pop_intern_unlocked(0x8092708, 0x0, 0x0), at 0xd29eb1d8 [9] g_async_queue_pop(0x8092708), at 0xd29eb293 =>[10] osync_queue_get_message(queue = 0x809b900), line 1044 in "opensync_queue.c" [11] osync_client_proxy_shutdown(proxy = 0x8095508, error = 0x80478d0), line 942 in "opensync_client_proxy.c" [12] _osync_engine_finalize_member(engine = 0x80a2040, proxy = 0x8095508, error = 0x80478d0), line 584 in "opensync_engine.c" [13] osync_engine_finalize(engine = 0x80a2040, error = 0x80478d0), line 1141 in "opensync_engine.c" [14] engine_sync(_i = 0), line 329 in "check_engine.c" [15] srunner_iterate_tcase_tfuns(0x8080330, 0x80801f0), at 0xd2564c6e [16] srunner_iterate_suites(0x8080330, 0x3), at 0xd2564928 [17] srunner_run_all(0x8080330, 0x3), at 0xd2564a9d [18] main(), line 1945 in "check_engine.c" others are either =========== current thread: t@27 [1] __pollsys(0xd15d8e90, 0x1, 0xd15d8e58, 0x0), at 0xd24b4cd5 [2] _pollsys(0xd15d8e90, 0x1, 0xd15d8e58, 0x0), at 0xd24a440c [3] _poll(0xd15d8e90, 0x1, 0x64), at 0xd2465eb2 =>[4] osync_queue_poll(queue = 0x8092df0), line 1017 in "opensync_queue.c" [5] _source_check(source = 0x809c348), line 467 in "opensync_queue.c" [6] g_main_context_check(0x8092e50, 0x7fffffff, 0x808ef08, 0x1), at 0xd2a0f64b [7] g_main_context_iterate(0x8092e50, 0x1, 0x1, 0x8085e80), at 0xd2a0fcc9 [8] g_main_loop_run(0x808e050), at 0xd2a102dc [9] g_thread_create_proxy(0x8085e80), at 0xd2a3462c [10] _thr_setup(0xd1fd1a00), at 0xd24b09e2 [11] _lwp_start(), at 0xd24b0c40 ================== OR ================== =>[1] __pollsys(0x80832c8, 0x1, 0xd1ecff08, 0x0), at 0xd24b4cd5 [2] _pollsys(0x80832c8, 0x1, 0xd1ecff08, 0x0), at 0xd24a440c [3] _poll(0x80832c8, 0x1, 0x1), at 0xd2465eb2 [4] g_main_context_iterate(0x8084658, 0x1, 0x1, 0x80932d0), at 0xd2a0fca3 [5] g_main_loop_run(0x8080ed8), at 0xd2a102dc [6] g_thread_create_proxy(0x80932d0), at 0xd2a3462c [7] _thr_setup(0xd1fd1200), at 0xd24b09e2 [8] _lwp_start(), at 0xd24b0c40 =================== Has anyone ever seen this problem? Does anyone know how to fix it? Thanks in advance. --Irene ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ Opensync-devel mailing list Opensync-devel@... https://lists.sourceforge.net/lists/listinfo/opensync-devel |
|
|
Re: osync_engine_finalize never exitsOn Thursday 17 April 2008 23:24:03 Irene Huang wrote:
> I am using the test binary engine in opensync and found that the > test > "engine_sync" never ends in osync_engine_finalize. > > The main thread sleeps at > [...] > > Has anyone ever seen this problem? > Does anyone know how to fix it? Uhh... this looks like the client recieves never the HUP signal from the client_proxy. See the commend in osync_client_proxy_shutdown() at the osync_queue_disconnect() call. Could you check if osync_queue_disconnect() really calls close(queue->fd) - this calls SHOULD emit a HUP signal on the other end. By the other end I mean the "client", which is acutally the plugin thread (or process) - it's the 2nd backtrace. Instead of shutting down it keeps polling on the pipe/queue: =>[4] osync_queue_poll(queue = 0x8092df0), line 1017 in "opensync_queue.c" [5] _source_check(source = 0x809c348), line 467 in "opensync_queue.c" [6] g_main_context_check(0x8092e50, 0x7fffffff, 0x808ef08, 0x1), at 0xd2a0f64b Actually the client should get this HUP signal (while polling) and shutting down. So the client_proxy gets as well a HUP signal, which singals that the client is gone. (See osync_client_proxy() and the comment above osync_queue_get_message() - it's also hanging right here.) You might instrument osync_queue_poll() with more osync_traces(). Could you check your traces file if this appears: osync_trace(TRACE_ERROR, "queue poll failed - system error :%i %s", errno, strerror(errno)); To clarify what actually i'm talking about. We have an IPC, you can find the reason about this and some more outdated stuff at http://opensync.org/wiki/IPC#WhyIPC - it's really outdated! - just check "Why IPC?" ;) The "big" pictue looks like this: OSyncClientProxy <-> OSyncClient (Framework/Engine) (Plugin-Thread/Process) I fear there might be some porting problems with our IPC implementation. It's based on all kinds of pipes (depending if the plugins is started in a thread or in a seperated process or is an external already running process). All this is done within OSyncQueue. I started also describing this briefly in the new OpenSync Whitepaper ./docs/whitepaper/ - check chapter IPC. http://cryptomilch.de/~dgollub/OpenSync/OpenSync-0.40-DRAFT-20080118.pdf (maybe the source in SVN is more up to date...) Regarding http://opensync.org/testing/testDetails.php?test=633&build=221 even on you testing host the IPC unit is failing. We should start debugging there. The engine test is VERY complex. I guess if we once fixed the IPC unit, we get rid of your initial problem. Looking at the test results the ipc_pipes_* most of the tests PASS beside ipc_timeout. Not quite sure which type of OSyncQueue we're looking at right now. Could you dump the content of the structure? I'm mostly interesed in "fd" and "name" of the structs members... (Don't hesitate to ask if there is something unclear... the mail is might be confusing - just started with coffee #1) best regards, Daniel ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ Opensync-devel mailing list Opensync-devel@... https://lists.sourceforge.net/lists/listinfo/opensync-devel |
|
|
Re: osync_engine_finalize never exitsHi, Daniel
I just had some investigation into the ipc_callback_break test case timeout issues on Solaris. (This is very similar to the osync_engine_finalize problem I mentioned in the previous email). With regard to some of your questions, here're my findings 1. Yes, it's true that the client_queue never receive a HUP signal. In ipc_callback_break the after "stop_after" times of messages, the client will disconnect itself, which signals the server a HUP, then the server disconnects. However, the client_queue never receives the expected HUP Message. The flow looks like Client(RECEIVER) disconnects --HUP--> Server(SENDER) gets HUP and disconnects --xx-HUP-xx--> Client (SENDER) does not disconnect----> Server (RECEIVER) does not disconnect In the ipc_callback_break, I think that the sever equals the clientproxy that you talk about in the previous email, correct me if I am wrong. My question is: who should send out the HUP message to be received by message = osync_queue_get_message(client_queue); in ipc_callback_break? My understanding is that when the Server(SENDER) gets HUP and disconnects, it should send out a HUP to the client_queue is this correct? Any hints about where the problem might exist? 2. For the two calls of osync_queue_disconnect, the close (fd) do exit correctly. 3. No osync_trace(TRACE_ERROR, "queue poll failed - system error :%i % s", errno, strerror(errno)); Was found in the osync traces. 4. I instrumented osync_queue_poll() with more osync_trace, compared with the osync_queue_poll() traces generated on Linux, Solaris doesn't seem to act differently at osync_queue_poll. Thanks --Irene On Fri, 2008-04-18 at 10:30 +0200, Daniel Gollub wrote: > On Thursday 17 April 2008 23:24:03 Irene Huang wrote: > > I am using the test binary engine in opensync and found that the > > test > > "engine_sync" never ends in osync_engine_finalize. > > > > The main thread sleeps at > > > [...] > > > > Has anyone ever seen this problem? > > Does anyone know how to fix it? > > Uhh... this looks like the client recieves never the HUP signal from the > client_proxy. See the commend in osync_client_proxy_shutdown() at the > osync_queue_disconnect() call. > > Could you check if osync_queue_disconnect() really calls close(queue->fd) - > this calls SHOULD emit a HUP signal on the other end. By the other end I mean > the "client", which is acutally the plugin thread (or process) - it's the 2nd > backtrace. Instead of shutting down it keeps polling on the pipe/queue: > > =>[4] osync_queue_poll(queue = 0x8092df0), line 1017 in > "opensync_queue.c" > [5] _source_check(source = 0x809c348), line 467 in > "opensync_queue.c" > [6] g_main_context_check(0x8092e50, 0x7fffffff, 0x808ef08, > 0x1), at > 0xd2a0f64b > > Actually the client should get this HUP signal (while polling) and shutting > down. So the client_proxy gets as well a HUP signal, which singals that the > client is gone. (See osync_client_proxy() and the comment above > osync_queue_get_message() - it's also hanging right here.) > > You might instrument osync_queue_poll() with more osync_traces(). Could you > check your traces file if this appears: > osync_trace(TRACE_ERROR, "queue poll failed - system error :%i %s", errno, > strerror(errno)); > > To clarify what actually i'm talking about. We have an IPC, you can find the > reason about this and some more outdated stuff at > http://opensync.org/wiki/IPC#WhyIPC - it's really outdated! - just check "Why > IPC?" ;) > > The "big" pictue looks like this: > > OSyncClientProxy <-> OSyncClient > (Framework/Engine) (Plugin-Thread/Process) > > > I fear there might be some porting problems with our IPC implementation. It's > based on all kinds of pipes (depending if the plugins is started in a thread > or in a seperated process or is an external already running process). All > this is done within OSyncQueue. I started also describing this briefly in the > new OpenSync Whitepaper ./docs/whitepaper/ - check chapter IPC. > http://cryptomilch.de/~dgollub/OpenSync/OpenSync-0.40-DRAFT-20080118.pdf > (maybe the source in SVN is more up to date...) > > Regarding http://opensync.org/testing/testDetails.php?test=633&build=221 even > on you testing host the IPC unit is failing. We should start debugging there. > The engine test is VERY complex. I guess if we once fixed the IPC unit, we > get rid of your initial problem. Looking at the test results the ipc_pipes_* > most of the tests PASS beside ipc_timeout. Not quite sure which type of > OSyncQueue we're looking at right now. Could you dump the content of the > structure? I'm mostly interesed in "fd" and "name" of the structs members... > > (Don't hesitate to ask if there is something unclear... the mail is might be > confusing - just started with coffee #1) > > best regards, > Daniel > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > Opensync-devel mailing list > Opensync-devel@... > https://lists.sourceforge.net/lists/listinfo/opensync-devel ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ Opensync-devel mailing list Opensync-devel@... https://lists.sourceforge.net/lists/listinfo/opensync-devel |
|
|
Re: osync_engine_finalize never exitsOn Wednesday 23 April 2008 20:15:01 Irene Huang wrote:
> I just had some investigation into the ipc_callback_break test case > timeout issues on Solaris. (This is very similar to the > osync_engine_finalize problem I mentioned in the previous email). With > regard to some of your questions, here're my findings > > > 1. Yes, it's true that the client_queue never receive a HUP signal. > In ipc_callback_break the after "stop_after" times of messages, the > client will disconnect itself, which signals the server a HUP, then the > server disconnects. > However, the client_queue never receives the expected HUP Message. > > The flow looks like > Client(RECEIVER) disconnects --HUP--> Server(SENDER) gets HUP and > disconnects > --xx-HUP-xx--> Client (SENDER) does not disconnect----> Server > (RECEIVER) does not disconnect > > In the ipc_callback_break, I think that the sever equals the clientproxy > that you talk about in the previous email, correct me if I am wrong. Correct. > > My question is: who should send out the HUP message to be received by > message = osync_queue_get_message(client_queue); > in ipc_callback_break? When osync_queue_poll() returns with OSYNC_QUEUE_EVENT_HUP in _source_check(), it will create a OSYNC_MESSAGE_QUEUE_HUP put this as last message on the queue... this is what osync_queue_get_message() waiting/blocking for. > My understanding is that when the Server(SENDER) gets HUP and > disconnects, it should send out a HUP to the client_queue is this > correct? Correct. > Any hints about where the problem might exist? Could it be that those problems ONLY appear with queues using FIFOs? FIFO queues -> osync_queue_new() PIPE queues -> osync_queue_new_pipes() We used to have a similar problem with Linux and pipes - not FIFOs. We just forgot to close the write end pipes. See commit r2525 and r2526. But i guess there might be differences between FIFOs in linux and Solaris. http://opensync.org/testing/testDetails.php?test=639&build=226 Regarding this test results ipc_connect() even works - it's using FIFO, but only one Queue. ipc_payload() and others which fails is using two (client/server) FIFO based Queues. Maybe here is again an issue when forking and not closing certain FIFO file descriptors, which only appears on Solaris, not linux this time. > > 2. For the two calls of osync_queue_disconnect, the close (fd) do exit > correctly. >/ > 3. No osync_trace(TRACE_ERROR, "queue poll failed - system error :%i % > s", errno, strerror(errno)); Was found in the osync traces. > > 4. I instrumented osync_queue_poll() with more osync_trace, compared > with the osync_queue_poll() traces generated on Linux, Solaris doesn't > seem to act differently at osync_queue_poll. I guess the behavior of FIFO and polling them does differ on Solaris... But it seems only in some cases there is no POLLHUP - is this correct? Could you send me the trace files of one of those IPC test which times out? best regards, Daniel ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ Opensync-devel mailing list Opensync-devel@... https://lists.sourceforge.net/lists/listinfo/opensync-devel |
| Free Forum Powered by Nabble | Forum Help |