[jira] Commented: (ZOOKEEPER-897) C Client seg faults during close
[ https://issues.apache.org/jira/browse/ZOOKEEPER-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925846#action_12925846 ] Patrick Hunt commented on ZOOKEEPER-897: perhaps we should rely on existing testing for this one, but enter a new jira to refactor the client, specifically to allow testing? (ie a way to inject the helper code w/o needing to edit zookeeper.c directly) C Client seg faults during close Key: ZOOKEEPER-897 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-897 Project: Zookeeper Issue Type: Bug Components: c client Reporter: Jared Cantwell Assignee: Jared Cantwell Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEEPER-897.diff, ZOOKEEPER-897.patch We observed a crash while closing our c client. It was in the do_io() thread that was processing as during the close() call. #0 queue_buffer (list=0x6bd4f8, b=0x0, add_to_front=0) at src/zookeeper.c:969 #1 0x0046234e in check_events (zh=0x6bd480, events=value optimized out) at src/zookeeper.c:1687 #2 0x00462d74 in zookeeper_process (zh=0x6bd480, events=2) at src/zookeeper.c:1971 #3 0x00469c34 in do_io (v=0x6bd480) at src/mt_adaptor.c:311 #4 0x77bc59ca in start_thread () from /lib/libpthread.so.0 #5 0x76f706fd in clone () from /lib/libc.so.6 #6 0x in ?? () We tracked down the sequence of events, and the cause is that input_buffer is being freed from a thread other than the do_io thread that relies on it: 1. do_io() call check_events() 2. if(eventsZOOKEEPER_READ) branch executes 3. if (rc 0) branch executes 4. if (zh-input_buffer != zh-primer_buffer) branch executes .in the meantime.. 5. zookeeper_close() called 6. if (inc_ref_counter(zh,0)!=0) branch executes 7. cleanup_bufs() is called 8. input_buffer is freed at the end . back to check_events(). 9. queue_events() is called on a NULL buffer. I believe the patch is to only call free_completions() in zookeeper_close() and not cleanup_bufs(). The original reason cleanup_bufs() was added was to call any outstanding synhcronous completions, so only free_completions (which is guarded) is needed. I will submit a patch for review with this change. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-897) C Client seg faults during close
[ https://issues.apache.org/jira/browse/ZOOKEEPER-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925858#action_12925858 ] Mahadev konar commented on ZOOKEEPER-897: - jared, pat, I am ok without a test case for this one, because its a quite hard to create one. I just wanted someone else to run the tests on there machines just to verify (since I rarely see any problems in c tests on my machine). I will go ahead and commit this patch for now. C Client seg faults during close Key: ZOOKEEPER-897 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-897 Project: Zookeeper Issue Type: Bug Components: c client Reporter: Jared Cantwell Assignee: Jared Cantwell Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEEPER-897.diff, ZOOKEEPER-897.patch We observed a crash while closing our c client. It was in the do_io() thread that was processing as during the close() call. #0 queue_buffer (list=0x6bd4f8, b=0x0, add_to_front=0) at src/zookeeper.c:969 #1 0x0046234e in check_events (zh=0x6bd480, events=value optimized out) at src/zookeeper.c:1687 #2 0x00462d74 in zookeeper_process (zh=0x6bd480, events=2) at src/zookeeper.c:1971 #3 0x00469c34 in do_io (v=0x6bd480) at src/mt_adaptor.c:311 #4 0x77bc59ca in start_thread () from /lib/libpthread.so.0 #5 0x76f706fd in clone () from /lib/libc.so.6 #6 0x in ?? () We tracked down the sequence of events, and the cause is that input_buffer is being freed from a thread other than the do_io thread that relies on it: 1. do_io() call check_events() 2. if(eventsZOOKEEPER_READ) branch executes 3. if (rc 0) branch executes 4. if (zh-input_buffer != zh-primer_buffer) branch executes .in the meantime.. 5. zookeeper_close() called 6. if (inc_ref_counter(zh,0)!=0) branch executes 7. cleanup_bufs() is called 8. input_buffer is freed at the end . back to check_events(). 9. queue_events() is called on a NULL buffer. I believe the patch is to only call free_completions() in zookeeper_close() and not cleanup_bufs(). The original reason cleanup_bufs() was added was to call any outstanding synhcronous completions, so only free_completions (which is guarded) is needed. I will submit a patch for review with this change. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-897) C Client seg faults during close
[ https://issues.apache.org/jira/browse/ZOOKEEPER-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924748#action_12924748 ] Jared Cantwell commented on ZOOKEEPER-897: -- we are using the 3.3.2 release. i don't think the patch leaks memory because destroy() will eventually get called (by the reentrant call to zookeeper_close()), which calls cleanup_bufs() and frees those buffers, right? Also, i had a test that reproduced this error, but it was easiest to reproduce if i injected artificial sleeps into the zookeeper.c file. If that's ok, then I can submit that. Otherwise, i'll see if i can devise a test that can reproduce it otherwise. C Client seg faults during close Key: ZOOKEEPER-897 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-897 Project: Zookeeper Issue Type: Bug Components: c client Reporter: Jared Cantwell Assignee: Jared Cantwell Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEEPER-897.diff, ZOOKEEPER-897.patch We observed a crash while closing our c client. It was in the do_io() thread that was processing as during the close() call. #0 queue_buffer (list=0x6bd4f8, b=0x0, add_to_front=0) at src/zookeeper.c:969 #1 0x0046234e in check_events (zh=0x6bd480, events=value optimized out) at src/zookeeper.c:1687 #2 0x00462d74 in zookeeper_process (zh=0x6bd480, events=2) at src/zookeeper.c:1971 #3 0x00469c34 in do_io (v=0x6bd480) at src/mt_adaptor.c:311 #4 0x77bc59ca in start_thread () from /lib/libpthread.so.0 #5 0x76f706fd in clone () from /lib/libc.so.6 #6 0x in ?? () We tracked down the sequence of events, and the cause is that input_buffer is being freed from a thread other than the do_io thread that relies on it: 1. do_io() call check_events() 2. if(eventsZOOKEEPER_READ) branch executes 3. if (rc 0) branch executes 4. if (zh-input_buffer != zh-primer_buffer) branch executes .in the meantime.. 5. zookeeper_close() called 6. if (inc_ref_counter(zh,0)!=0) branch executes 7. cleanup_bufs() is called 8. input_buffer is freed at the end . back to check_events(). 9. queue_events() is called on a NULL buffer. I believe the patch is to only call free_completions() in zookeeper_close() and not cleanup_bufs(). The original reason cleanup_bufs() was added was to call any outstanding synhcronous completions, so only free_completions (which is guarded) is needed. I will submit a patch for review with this change. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.