[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher
[ https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Austin Shoemaker updated ZOOKEEPER-888: --- Attachment: ZOOKEEPER-888-3.3.patch Patch based on the 3.3 branch attached (ZOOKEEPER-888-3.3.patch). Verified that unit tests pass with the changes, including the new watcher_test. > c-client / zkpython: Double free corruption on node watcher > --- > > Key: ZOOKEEPER-888 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888 > Project: Zookeeper > Issue Type: Bug > Components: c client, contrib-bindings >Affects Versions: 3.3.1 >Reporter: Lukas >Assignee: Lukas >Priority: Critical > Fix For: 3.3.2, 3.4.0 > > Attachments: resume-segfault.py, ZOOKEEPER-888-3.3.patch, > ZOOKEEPER-888.patch > > > the c-client / zkpython wrapper invokes already freed watcher callback > steps to reproduce: > 0. start a zookeper server on your machine > 1. run the attached python script > 2. suspend the zookeeper server process (e.g. using `pkill -STOP -f > org.apache.zookeeper.server.quorum.QuorumPeerMain` ) > 3. wait until the connection and the node observer fired with a session > event > 4. resume the zookeeper server process (e.g. using `pkill -CONT -f > org.apache.zookeeper.server.quorum.QuorumPeerMain` ) > -> the client tries to dispatch the node observer function again, but it was > already freed -> double free corruption -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher
[ https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Austin Shoemaker updated ZOOKEEPER-888: --- Attachment: ZOOKEEPER-888.patch Updated ZOOKEEPER-888.patch with the following changes: - Fixed zookeeper.is_unrecoverable to return the correct value, it was returning false in all cases. - Added watcher_test.py to cover the issue this patch fixes. Verified that it crashes before patching and succeeds afterward. > c-client / zkpython: Double free corruption on node watcher > --- > > Key: ZOOKEEPER-888 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888 > Project: Zookeeper > Issue Type: Bug > Components: c client, contrib-bindings >Affects Versions: 3.3.1 >Reporter: Lukas >Priority: Critical > Fix For: 3.3.2, 3.4.0 > > Attachments: resume-segfault.py, ZOOKEEPER-888.patch > > > the c-client / zkpython wrapper invokes already freed watcher callback > steps to reproduce: > 0. start a zookeper server on your machine > 1. run the attached python script > 2. suspend the zookeeper server process (e.g. using `pkill -STOP -f > org.apache.zookeeper.server.quorum.QuorumPeerMain` ) > 3. wait until the connection and the node observer fired with a session > event > 4. resume the zookeeper server process (e.g. using `pkill -CONT -f > org.apache.zookeeper.server.quorum.QuorumPeerMain` ) > -> the client tries to dispatch the node observer function again, but it was > already freed -> double free corruption -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher
[ https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Austin Shoemaker updated ZOOKEEPER-888: --- Status: Open (was: Patch Available) > c-client / zkpython: Double free corruption on node watcher > --- > > Key: ZOOKEEPER-888 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888 > Project: Zookeeper > Issue Type: Bug > Components: c client, contrib-bindings >Affects Versions: 3.3.1 >Reporter: Lukas >Priority: Critical > Fix For: 3.3.2, 3.4.0 > > Attachments: resume-segfault.py > > > the c-client / zkpython wrapper invokes already freed watcher callback > steps to reproduce: > 0. start a zookeper server on your machine > 1. run the attached python script > 2. suspend the zookeeper server process (e.g. using `pkill -STOP -f > org.apache.zookeeper.server.quorum.QuorumPeerMain` ) > 3. wait until the connection and the node observer fired with a session > event > 4. resume the zookeeper server process (e.g. using `pkill -CONT -f > org.apache.zookeeper.server.quorum.QuorumPeerMain` ) > -> the client tries to dispatch the node observer function again, but it was > already freed -> double free corruption -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher
[ https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Austin Shoemaker updated ZOOKEEPER-888: --- Attachment: (was: ZOOKEEPER-888.patch) > c-client / zkpython: Double free corruption on node watcher > --- > > Key: ZOOKEEPER-888 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888 > Project: Zookeeper > Issue Type: Bug > Components: c client, contrib-bindings >Affects Versions: 3.3.1 >Reporter: Lukas >Priority: Critical > Fix For: 3.3.2, 3.4.0 > > Attachments: resume-segfault.py > > > the c-client / zkpython wrapper invokes already freed watcher callback > steps to reproduce: > 0. start a zookeper server on your machine > 1. run the attached python script > 2. suspend the zookeeper server process (e.g. using `pkill -STOP -f > org.apache.zookeeper.server.quorum.QuorumPeerMain` ) > 3. wait until the connection and the node observer fired with a session > event > 4. resume the zookeeper server process (e.g. using `pkill -CONT -f > org.apache.zookeeper.server.quorum.QuorumPeerMain` ) > -> the client tries to dispatch the node observer function again, but it was > already freed -> double free corruption -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (ZOOKEEPER-890) C client invokes watcher callbacks multiple times
[ https://issues.apache.org/jira/browse/ZOOKEEPER-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Austin Shoemaker resolved ZOOKEEPER-890. Resolution: Not A Problem Closing, C client works as intended. Submitted a patch in ZOOKEEPER-888 to handle this properly in zkpython. > C client invokes watcher callbacks multiple times > - > > Key: ZOOKEEPER-890 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-890 > Project: Zookeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.3.1 > Environment: Mac OS X 10.6.5 >Reporter: Austin Shoemaker >Priority: Critical > Attachments: watcher_twice.c, ZOOKEEPER-890.patch > > > Code using the C client assumes that watcher callbacks are called exactly > once. If the watcher is called more than once, the process will likely > overwrite freed memory and/or crash. > collect_session_watchers (zk_hashtable.c) gathers watchers from > active_node_watchers, active_exist_watchers, and active_child_watchers > without removing them. This results in watchers being invoked more than once. > Test code is attached that reproduces the bug, along with a proposed patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-740) zkpython leading to segfault on zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919553#action_12919553 ] Austin Shoemaker commented on ZOOKEEPER-740: ZOOKEEPER-740.patch fixes the crash, though it looks like the pywatcher_t will be leaked on an unrecoverable session state change (EXPIRED_SESSION_STATE or AUTH_FAILED_STATE). Attached a proposed revision to ZOOKEEPER-888 for your review. > zkpython leading to segfault on zookeeper > - > > Key: ZOOKEEPER-740 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-740 > Project: Zookeeper > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Federico >Assignee: Henry Robinson >Priority: Critical > Fix For: 3.4.0 > > Attachments: ZOOKEEPER-740.patch > > > The program that we are implementing uses the python binding for zookeeper > but sometimes it crash with segfault; here is the bt from gdb: > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0xad244b70 (LWP 28216)] > 0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0) > at ../Objects/abstract.c:2488 > 2488../Objects/abstract.c: No such file or directory. > in ../Objects/abstract.c > (gdb) bt > #0 0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0) > at ../Objects/abstract.c:2488 > #1 0x080d6ef2 in PyEval_CallObjectWithKeywords (func=0x862fab0, > arg=0x8837194, kw=0x0) at ../Python/ceval.c:3575 > #2 0x080612a0 in PyObject_CallObject (o=0x862fab0, a=0x8837194) > at ../Objects/abstract.c:2480 > #3 0x0047af42 in watcher_dispatch (zzh=0x86174e0, type=-1, state=1, > path=0x86337c8 "", context=0x8588660) at src/c/zookeeper.c:314 > #4 0x00496559 in do_foreach_watcher (zh=0x86174e0, type=-1, state=1, > path=0x86337c8 "", list=0xa5354140) at src/zk_hashtable.c:275 > #5 deliverWatchers (zh=0x86174e0, type=-1, state=1, path=0x86337c8 "", > list=0xa5354140) at src/zk_hashtable.c:317 > #6 0x0048ae3c in process_completions (zh=0x86174e0) at src/zookeeper.c:1766 > #7 0x0049706b in do_completion (v=0x86174e0) at src/mt_adaptor.c:333 > #8 0x0013380e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0 > #9 0x002578de in clone () from /lib/tls/i686/cmov/libc.so.6 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher
[ https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Austin Shoemaker updated ZOOKEEPER-888: --- Attachment: ZOOKEEPER-888.patch Improved patch attached. Before, watcher_dispatch would unconditionally free non-global watcher objects. Any number of recoverable session state change events may be sent to the watcher. This change frees the watcher only on the last callback- a data change event or unrecoverable session state change. > c-client / zkpython: Double free corruption on node watcher > --- > > Key: ZOOKEEPER-888 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888 > Project: Zookeeper > Issue Type: Bug > Components: c client, contrib-bindings >Affects Versions: 3.3.1 >Reporter: Lukas >Priority: Critical > Fix For: 3.3.2, 3.4.0 > > Attachments: resume-segfault.py, ZOOKEEPER-888.patch > > > the c-client / zkpython wrapper invokes already freed watcher callback > steps to reproduce: > 0. start a zookeper server on your machine > 1. run the attached python script > 2. suspend the zookeeper server process (e.g. using `pkill -STOP -f > org.apache.zookeeper.server.quorum.QuorumPeerMain` ) > 3. wait until the connection and the node observer fired with a session > event > 4. resume the zookeeper server process (e.g. using `pkill -CONT -f > org.apache.zookeeper.server.quorum.QuorumPeerMain` ) > -> the client tries to dispatch the node observer function again, but it was > already freed -> double free corruption -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher
[ https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Austin Shoemaker updated ZOOKEEPER-888: --- Attachment: (was: ZOOKEEPER-888.patch) > c-client / zkpython: Double free corruption on node watcher > --- > > Key: ZOOKEEPER-888 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888 > Project: Zookeeper > Issue Type: Bug > Components: c client, contrib-bindings >Affects Versions: 3.3.1 >Reporter: Lukas >Priority: Critical > Fix For: 3.3.2, 3.4.0 > > Attachments: resume-segfault.py > > > the c-client / zkpython wrapper invokes already freed watcher callback > steps to reproduce: > 0. start a zookeper server on your machine > 1. run the attached python script > 2. suspend the zookeeper server process (e.g. using `pkill -STOP -f > org.apache.zookeeper.server.quorum.QuorumPeerMain` ) > 3. wait until the connection and the node observer fired with a session > event > 4. resume the zookeeper server process (e.g. using `pkill -CONT -f > org.apache.zookeeper.server.quorum.QuorumPeerMain` ) > -> the client tries to dispatch the node observer function again, but it was > already freed -> double free corruption -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher
[ https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Austin Shoemaker updated ZOOKEEPER-888: --- Attachment: ZOOKEEPER-888.patch Path that prevents freeing a watcher in response to a session event, per the feedback in ZOOKEEPER-890. > c-client / zkpython: Double free corruption on node watcher > --- > > Key: ZOOKEEPER-888 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888 > Project: Zookeeper > Issue Type: Bug > Components: c client, contrib-bindings >Affects Versions: 3.3.1 >Reporter: Lukas >Priority: Critical > Fix For: 3.3.2, 3.4.0 > > Attachments: resume-segfault.py, ZOOKEEPER-888.patch > > > the c-client / zkpython wrapper invokes already freed watcher callback > steps to reproduce: > 0. start a zookeper server on your machine > 1. run the attached python script > 2. suspend the zookeeper server process (e.g. using `pkill -STOP -f > org.apache.zookeeper.server.quorum.QuorumPeerMain` ) > 3. wait until the connection and the node observer fired with a session > event > 4. resume the zookeeper server process (e.g. using `pkill -CONT -f > org.apache.zookeeper.server.quorum.QuorumPeerMain` ) > -> the client tries to dispatch the node observer function again, but it was > already freed -> double free corruption -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-890) C client invokes watcher callbacks multiple times
[ https://issues.apache.org/jira/browse/ZOOKEEPER-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919094#action_12919094 ] Austin Shoemaker commented on ZOOKEEPER-890: That sounds like a good design. Perhaps it could be clarified in the documentation? http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperProgrammers.html#ch_zkWatches If this is correct behavior then the Python client needs to be fixed to not delete the watcher on session events. Will file a separate bug on that. > C client invokes watcher callbacks multiple times > - > > Key: ZOOKEEPER-890 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-890 > Project: Zookeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.3.1 > Environment: Mac OS X 10.6.5 >Reporter: Austin Shoemaker >Priority: Critical > Attachments: watcher_twice.c, ZOOKEEPER-890.patch > > > Code using the C client assumes that watcher callbacks are called exactly > once. If the watcher is called more than once, the process will likely > overwrite freed memory and/or crash. > collect_session_watchers (zk_hashtable.c) gathers watchers from > active_node_watchers, active_exist_watchers, and active_child_watchers > without removing them. This results in watchers being invoked more than once. > Test code is attached that reproduces the bug, along with a proposed patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-890) C client invokes watcher callbacks multiple times
[ https://issues.apache.org/jira/browse/ZOOKEEPER-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Austin Shoemaker updated ZOOKEEPER-890: --- Description: Code using the C client assumes that watcher callbacks are called exactly once. If the watcher is called more than once, the process will likely overwrite freed memory and/or crash. collect_session_watchers (zk_hashtable.c) gathers watchers from active_node_watchers, active_exist_watchers, and active_child_watchers without removing them. This results in watchers being invoked more than once. Test code is attached that reproduces the bug, along with a proposed patch. was: The collect_session_watchers function in zk_hashtable.c gathers watchers from active_node_watchers, active_exist_watchers, and active_child_watchers without removing the watchers from the table. Please see attached repro case and patch. > C client invokes watcher callbacks multiple times > - > > Key: ZOOKEEPER-890 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-890 > Project: Zookeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.3.1 > Environment: Mac OS X 10.6.5 >Reporter: Austin Shoemaker >Priority: Critical > Attachments: watcher_twice.c, ZOOKEEPER-890.patch > > > Code using the C client assumes that watcher callbacks are called exactly > once. If the watcher is called more than once, the process will likely > overwrite freed memory and/or crash. > collect_session_watchers (zk_hashtable.c) gathers watchers from > active_node_watchers, active_exist_watchers, and active_child_watchers > without removing them. This results in watchers being invoked more than once. > Test code is attached that reproduces the bug, along with a proposed patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-890) C client invokes watcher callbacks multiple times
[ https://issues.apache.org/jira/browse/ZOOKEEPER-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Austin Shoemaker updated ZOOKEEPER-890: --- Attachment: ZOOKEEPER-890.patch Patch that clears active watcher sets when broadcasting a session event to all watchers. > C client invokes watcher callbacks multiple times > - > > Key: ZOOKEEPER-890 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-890 > Project: Zookeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.3.1 > Environment: Mac OS X 10.6.5 >Reporter: Austin Shoemaker >Priority: Critical > Attachments: watcher_twice.c, ZOOKEEPER-890.patch > > > The collect_session_watchers function in zk_hashtable.c gathers watchers from > active_node_watchers, active_exist_watchers, and active_child_watchers > without removing the watchers from the table. > Please see attached repro case and patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-890) C client invokes watcher callbacks multiple times
[ https://issues.apache.org/jira/browse/ZOOKEEPER-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Austin Shoemaker updated ZOOKEEPER-890: --- Attachment: watcher_twice.c > C client invokes watcher callbacks multiple times > - > > Key: ZOOKEEPER-890 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-890 > Project: Zookeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.3.1 > Environment: Mac OS X 10.6.5 >Reporter: Austin Shoemaker >Priority: Critical > Attachments: watcher_twice.c > > > The collect_session_watchers function in zk_hashtable.c gathers watchers from > active_node_watchers, active_exist_watchers, and active_child_watchers > without removing the watchers from the table. > Please see attached repro case and patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-890) C client invokes watcher callbacks multiple times
C client invokes watcher callbacks multiple times - Key: ZOOKEEPER-890 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-890 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.3.1 Environment: Mac OS X 10.6.5 Reporter: Austin Shoemaker Priority: Critical Attachments: watcher_twice.c The collect_session_watchers function in zk_hashtable.c gathers watchers from active_node_watchers, active_exist_watchers, and active_child_watchers without removing the watchers from the table. Please see attached repro case and patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (ZOOKEEPER-889) pyzoo_aget_children crashes due to incorrect watcher context
[ https://issues.apache.org/jira/browse/ZOOKEEPER-889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Austin Shoemaker resolved ZOOKEEPER-889. Resolution: Fixed Just noticed that the fix is already in trunk, closing the issue. > pyzoo_aget_children crashes due to incorrect watcher context > > > Key: ZOOKEEPER-889 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-889 > Project: Zookeeper > Issue Type: Bug > Components: contrib-bindings >Affects Versions: 3.3.1 > Environment: OS X 10.6.5, Python 2.6.1 >Reporter: Austin Shoemaker >Priority: Critical > Attachments: repro.py > > > The pyzoo_aget_children function passes the completion callback ("pyw") in > place of the watcher callback ("get_pyw"). Since it is a one-shot callback, > it is deallocated after the completion callback fires, causing a crash when > the watcher callback should be invoked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-889) pyzoo_aget_children crashes due to incorrect watcher context
[ https://issues.apache.org/jira/browse/ZOOKEEPER-889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Austin Shoemaker updated ZOOKEEPER-889: --- Attachment: repro.py Minimal repro script > pyzoo_aget_children crashes due to incorrect watcher context > > > Key: ZOOKEEPER-889 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-889 > Project: Zookeeper > Issue Type: Bug > Components: contrib-bindings >Affects Versions: 3.3.1 > Environment: OS X 10.6.5, Python 2.6.1 >Reporter: Austin Shoemaker >Priority: Critical > Attachments: repro.py > > > The pyzoo_aget_children function passes the completion callback ("pyw") in > place of the watcher callback ("get_pyw"). Since it is a one-shot callback, > it is deallocated after the completion callback fires, causing a crash when > the watcher callback should be invoked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-889) pyzoo_aget_children crashes due to incorrect watcher context
pyzoo_aget_children crashes due to incorrect watcher context Key: ZOOKEEPER-889 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-889 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.1 Environment: OS X 10.6.5, Python 2.6.1 Reporter: Austin Shoemaker Priority: Critical The pyzoo_aget_children function passes the completion callback ("pyw") in place of the watcher callback ("get_pyw"). Since it is a one-shot callback, it is deallocated after the completion callback fires, causing a crash when the watcher callback should be invoked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-208) Zookeeper C client uses API that are not thread safe, causing crashes when multiple instances are active
[ https://issues.apache.org/jira/browse/ZOOKEEPER-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648756#action_12648756 ] Austin Shoemaker commented on ZOOKEEPER-208: Chris, thanks for modifying my patch to comply with the project. I reattached it granting the license, let me know if I can help with anything else. > Zookeeper C client uses API that are not thread safe, causing crashes when > multiple instances are active > > > Key: ZOOKEEPER-208 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-208 > Project: Zookeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.0.0 > Environment: Linux >Reporter: Austin Shoemaker >Assignee: Austin Shoemaker >Priority: Critical > Fix For: 3.1.0 > > Attachments: zookeeper-strtok_getaddrinfo-trunk.patch, > zookeeper-strtok_getaddrinfo-trunk.patch > > > The Zookeeper C client library uses gethostbyname and strtok, both of which > are not safe to use from multiple threads. > The problem is resolved by using getaddrinfo and strtok_r in place of the > older API. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-208) Zookeeper C client uses API that are not thread safe, causing crashes when multiple instances are active
[ https://issues.apache.org/jira/browse/ZOOKEEPER-208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Austin Shoemaker updated ZOOKEEPER-208: --- Attachment: zookeeper-strtok_getaddrinfo-trunk.patch Reattaching patch with license granted. > Zookeeper C client uses API that are not thread safe, causing crashes when > multiple instances are active > > > Key: ZOOKEEPER-208 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-208 > Project: Zookeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.0.0 > Environment: Linux >Reporter: Austin Shoemaker >Assignee: Austin Shoemaker >Priority: Critical > Fix For: 3.1.0 > > Attachments: zookeeper-strtok_getaddrinfo-trunk.patch, > zookeeper-strtok_getaddrinfo-trunk.patch > > > The Zookeeper C client library uses gethostbyname and strtok, both of which > are not safe to use from multiple threads. > The problem is resolved by using getaddrinfo and strtok_r in place of the > older API. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-208) Zookeeper C client uses API that are not thread safe, causing crashes when multiple instances are active
Zookeeper C client uses API that are not thread safe, causing crashes when multiple instances are active Key: ZOOKEEPER-208 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-208 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.0.0 Environment: Linux Reporter: Austin Shoemaker Priority: Critical The Zookeeper C client library uses gethostbyname and strtok, both of which are not safe to use from multiple threads. Below is the original patch we made which fixes the problem. The problem is resolved by using getaddrinfo and strtok_r in place of the older API. Patch for zookeeper-c-client-2.2.1/src/zookeeper.c (2008-06-09 on SF.net) 241c241 < struct hostent *he; --- > struct addrinfo hints, *res, *res0; 243,245d242 < struct sockaddr_in *addr4; < struct sockaddr_in6 *addr6; < char **ptr; 247a245 > char *strtok_last; 263c261 < host=strtok(hosts, ","); --- > host=strtok_r(hosts, ",", &strtok_last); 283,294c281,297 < he = gethostbyname(host); < if (!he) { < LOG_ERROR(("could not resolve %s", host)); < errno=EINVAL; < rc=ZBADARGUMENTS; < goto fail; < } < < /* Setup the address array */ < for(ptr = he->h_addr_list;*ptr != 0; ptr++) { < if (zh->addrs_count == alen) { < void *tmpaddr; --- > > memset(&hints, 0, sizeof(hints)); > hints.ai_flags = AI_ADDRCONFIG; > hints.ai_family = AF_UNSPEC; > hints.ai_socktype = SOCK_STREAM; > hints.ai_protocol = IPPROTO_TCP; > > if (getaddrinfo(host, port_spec, &hints, &res0) != 0) { > LOG_ERROR(("getaddrinfo: %s\n", strerror(errno))); > rc=ZSYSTEMERROR; > goto fail; > } > > for (res = res0; res; res = res->ai_next) { > // Expand address list if needed > if (zh->addrs_count == alen) { > void *tmpaddr; 304,313c307,312 < } < addr = &zh->addrs[zh->addrs_count]; < addr4 = (struct sockaddr_in*)addr; < addr6 = (struct sockaddr_in6*)addr; < addr->sa_family = he->h_addrtype; < if (addr->sa_family == AF_INET) { < addr4->sin_port = htons(port); < memset(&addr4->sin_zero, 0, sizeof(addr4->sin_zero)); < memcpy(&addr4->sin_addr, *ptr, he->h_length); < zh->addrs_count++; --- > } > > // Copy addrinfo into address list > addr = &zh->addrs[zh->addrs_count]; > switch (res->ai_family) { > case AF_INET: 315,320c314 < } else if (addr->sa_family == AF_INET6) { < addr6->sin6_port = htons(port); < addr6->sin6_scope_id = 0; < addr6->sin6_flowinfo = 0; < memcpy(&addr6->sin6_addr, *ptr, he->h_length); < zh->addrs_count++; --- > case AF_INET6: 322,327c316,328 < } else { < LOG_WARN(("skipping unknown address family %x for %s", < addr->sa_family, zh->hostname)); < } < } < host = strtok(0, ","); --- > memcpy(addr, res->ai_addr, res->ai_addrlen); > ++zh->addrs_count; > break; > default: > LOG_WARN(("skipping unknown address family %x > for %s", > res->ai_family, zh->hostname)); > break; > } > } > > freeaddrinfo(res0); > > host = strtok_r(0, ",", &strtok_last); 329a331 > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-17) zookeeper_init doc needs clarification
[ https://issues.apache.org/jira/browse/ZOOKEEPER-17?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635964#action_12635964 ] Austin Shoemaker commented on ZOOKEEPER-17: --- The documentation states that if the client_id given to zookeeper_init is expired or invalid that a new session will be automatically generated, implying that it will proceed to the CONNECTED state. In the implementation an expired or invalid client_id leads to the unrecoverable SESSION_EXPIRED_STATE, which requires closing and reopening a new connection with no client_id specified to continue. Since the server has already assigned a replacement client_id it seems logical to follow the header documentation and proceed with the new value, which appears to be possible by removing the if-block that triggers the expired state in check_events (zookeeper.c). If the client application needs to know if the session was replaced, it can simply compare the client_id it provided with the client_id upon entering CONNECTED_STATE. What do you think? > zookeeper_init doc needs clarification > -- > > Key: ZOOKEEPER-17 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-17 > Project: Zookeeper > Issue Type: Bug > Components: c client, documentation >Reporter: Patrick Hunt >Assignee: Patrick Hunt > Fix For: 3.0.0 > > Attachments: ZOOKEEPER-17.patch > > > Moved from SourceForge to Apache. > http://sourceforge.net/tracker/index.php?func=detail&aid=1967467&group_id=209147&atid=1008544 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-131) Old leader election can elect a dead leader over and over again
[ https://issues.apache.org/jira/browse/ZOOKEEPER-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632120#action_12632120 ] Austin Shoemaker commented on ZOOKEEPER-131: This patch appears to solve the problem for algorithm 0- our unit test completed successfully 16 times. > Old leader election can elect a dead leader over and over again > --- > > Key: ZOOKEEPER-131 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-131 > Project: Zookeeper > Issue Type: Bug > Components: leaderElection >Reporter: Benjamin Reed >Assignee: Benjamin Reed > Attachments: ZOOKEEPER-131.patch > > > I think there is a race condition that is probably easy to get into with the > old leader election and a large number of servers: > 1) Leader dies > 2) Followers start looking for a new leader before all Followers have > abandoned the Leader > 3) The Followers looking for a new leader see votes of Followers still > following the (now dead) Leader and start voting for the dead Leader > 4) The dead Leader gets reelected. > For the old leader election a server should not vote for another server that > is not nominating himself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (ZOOKEEPER-127) Use of non-standard election ports in config breaks services
[ https://issues.apache.org/jira/browse/ZOOKEEPER-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632111#action_12632111 ] austin edited comment on ZOOKEEPER-127 at 9/17/08 11:36 PM: -- After several more runs of our unit test using the patched algorithm 3, the test hangs as the service repeatedly tries to reelect the killed leader. This behavior is similar to ZOOKEEPER-131 which we had experienced using algorithms 0 and 1. Server 10 is 10.50.65.40 and has been explicitly killed. The following log is from server 5, which mirrors logs on all the other servers. Any idea what's happening here? 2008-09-18 00:28:20,029 - INFO [QuorumPeer:[EMAIL PROTECTED] - LOOKING 2008-09-18 00:28:20,029 - WARN [QuorumPeer:[EMAIL PROTECTED] - unable to parse zxid string into long: txt 2008-09-18 00:28:20,029 - WARN [QuorumPeer:[EMAIL PROTECTED] - New election: 8589935405 2008-09-18 00:28:20,031 - WARN [WorkerSender Thread:[EMAIL PROTECTED] - Cannot open channel to 10( java.net.ConnectException: Connection refused) 2008-09-18 00:28:20,031 - INFO [QuorumPeer:[EMAIL PROTECTED] - FOLLOWING 2008-09-18 00:28:20,031 - INFO [QuorumPeer:[EMAIL PROTECTED] - Created server with dataDir:/zookeeper_data/5_data dataLogDir:/zookeeper_data/5_data tickT ime:2000 2008-09-18 00:28:20,031 - INFO [QuorumPeer:[EMAIL PROTECTED] - Following /10.50.65.40:2888 [[[ exception below repeats 5 times ]]] 2008-09-18 00:28:20,032 - WARN [QuorumPeer:[EMAIL PROTECTED] - Unexpected exception java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:519) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:137) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:405) [[[ then the follower is restarted ]]] 2008-09-18 00:28:24,049 - ERROR [QuorumPeer:[EMAIL PROTECTED] - FIXMSG java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:370) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:409) [[[ at this point the log repeats from the beginning ]]] was (Author: austin): After about 6 runs of our unit test the test hangs as the service repeatedly tries to reelect the killed leader (similar to ZOOKEEPER-131 with algorithms 0 and 1). After several more runs of our unit test using the patched algorithm 3, the test hangs as the service repeatedly tries to reelect the killed leader. This behavior is similar to ZOOKEEPER-131 which we had experienced using algorithms 0 and 1. Server 10 is 10.50.65.40 and has been explicitly killed. The following log is from server 5, which mirrors logs on all the other servers. Any idea what's happening here? 2008-09-18 00:28:20,029 - INFO [QuorumPeer:[EMAIL PROTECTED] - LOOKING 2008-09-18 00:28:20,029 - WARN [QuorumPeer:[EMAIL PROTECTED] - unable to parse zxid string into long: txt 2008-09-18 00:28:20,029 - WARN [QuorumPeer:[EMAIL PROTECTED] - New election: 8589935405 2008-09-18 00:28:20,031 - WARN [WorkerSender Thread:[EMAIL PROTECTED] - Cannot open channel to 10( java.net.ConnectException: Connection refused) 2008-09-18 00:28:20,031 - INFO [QuorumPeer:[EMAIL PROTECTED] - FOLLOWING 2008-09-18 00:28:20,031 - INFO [QuorumPeer:[EMAIL PROTECTED] - Created server with dataDir:/zookeeper_data/5_data dataLogDir:/zookeeper_data/5_data tickT ime:2000 2008-09-18 00:28:20,031 - INFO [QuorumPeer:[EMAIL PROTECTED] - Following /10.50.65.40:2888 [[[ exception below repeats 5 times ]]] 2008-09-18 00:28:20,032 - WARN [QuorumPeer:[EMAIL PROTECTED] - Unexpected exception java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:519) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:137) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:405) [[[ then the follower is restarted ]]] 2008-09-18 00:28:24,049 - ERROR [QuorumPeer:[EMAIL PROTECTED] - FIXMSG java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:370) at org.apache.zookeeper
[jira] Commented: (ZOOKEEPER-127) Use of non-standard election ports in config breaks services
[ https://issues.apache.org/jira/browse/ZOOKEEPER-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632111#action_12632111 ] Austin Shoemaker commented on ZOOKEEPER-127: After about 6 runs of our unit test the test hangs as the service repeatedly tries to reelect the killed leader (similar to ZOOKEEPER-131 with algorithms 0 and 1). After several more runs of our unit test using the patched algorithm 3, the test hangs as the service repeatedly tries to reelect the killed leader. This behavior is similar to ZOOKEEPER-131 which we had experienced using algorithms 0 and 1. Server 10 is 10.50.65.40 and has been explicitly killed. The following log is from server 5, which mirrors logs on all the other servers. Any idea what's happening here? 2008-09-18 00:28:20,029 - INFO [QuorumPeer:[EMAIL PROTECTED] - LOOKING 2008-09-18 00:28:20,029 - WARN [QuorumPeer:[EMAIL PROTECTED] - unable to parse zxid string into long: txt 2008-09-18 00:28:20,029 - WARN [QuorumPeer:[EMAIL PROTECTED] - New election: 8589935405 2008-09-18 00:28:20,031 - WARN [WorkerSender Thread:[EMAIL PROTECTED] - Cannot open channel to 10( java.net.ConnectException: Connection refused) 2008-09-18 00:28:20,031 - INFO [QuorumPeer:[EMAIL PROTECTED] - FOLLOWING 2008-09-18 00:28:20,031 - INFO [QuorumPeer:[EMAIL PROTECTED] - Created server with dataDir:/zookeeper_data/5_data dataLogDir:/zookeeper_data/5_data tickT ime:2000 2008-09-18 00:28:20,031 - INFO [QuorumPeer:[EMAIL PROTECTED] - Following /10.50.65.40:2888 [[[ exception below repeats 5 times ]]] 2008-09-18 00:28:20,032 - WARN [QuorumPeer:[EMAIL PROTECTED] - Unexpected exception java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:519) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:137) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:405) [[[ then the follower is restarted ]]] 2008-09-18 00:28:24,049 - ERROR [QuorumPeer:[EMAIL PROTECTED] - FIXMSG java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:370) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:409) [[[ at this point the log repeats from the beginning ]]] > Use of non-standard election ports in config breaks services > > > Key: ZOOKEEPER-127 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-127 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.0.0 >Reporter: Mark Harwood >Assignee: Flavio Paiva Junqueira >Priority: Minor > Fix For: 3.0.0 > > Attachments: mhPortChanges.patch, ZOOKEEPER-127.patch, > ZOOKEEPER-127.patch, ZOOKEEPER-127.patch > > > In QuorumCnxManager.toSend there is a call to create a connection as follows: > channel = SocketChannel.open(new InetSocketAddress(addr, port)); > Unfortunately "addr" is the ip address of a remote server while "port" is the > electionPort of *this* server. > As an example, given this configuration (taken from my zoo.cfg) > server.1=10.20.9.254:2881 > server.2=10.20.9.9:2882 > server.3=10.20.9.254:2883 > Server 3 was observed trying to make a connection to host 10.20.9.9 on port > 2883 and obviously failing. > In tests where all machines use the same electionPort this bug would not > manifest itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-127) Use of non-standard election ports in config breaks services
[ https://issues.apache.org/jira/browse/ZOOKEEPER-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632099#action_12632099 ] Austin Shoemaker commented on ZOOKEEPER-127: Applying the patch (from 9/17) to the latest trunk (r696563) now passes our leader election unit tests using algorithm 3. This is great. Two minor issues I noticed: 1. The default constructor for QuorumPeer should call setStatsProvider, rather than the attribute-passing constructor. Since QuorumPeerMain calls the default constructor, echo stat | nc ... requests are returning invalid data because no provider is set. 2. In QuorumPeerConfig.java:105 where parts.length is checked the operator should be && instead of ||. > Use of non-standard election ports in config breaks services > > > Key: ZOOKEEPER-127 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-127 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.0.0 >Reporter: Mark Harwood >Assignee: Flavio Paiva Junqueira >Priority: Minor > Fix For: 3.0.0 > > Attachments: mhPortChanges.patch, ZOOKEEPER-127.patch, > ZOOKEEPER-127.patch, ZOOKEEPER-127.patch > > > In QuorumCnxManager.toSend there is a call to create a connection as follows: > channel = SocketChannel.open(new InetSocketAddress(addr, port)); > Unfortunately "addr" is the ip address of a remote server while "port" is the > electionPort of *this* server. > As an example, given this configuration (taken from my zoo.cfg) > server.1=10.20.9.254:2881 > server.2=10.20.9.9:2882 > server.3=10.20.9.254:2883 > Server 3 was observed trying to make a connection to host 10.20.9.9 on port > 2883 and obviously failing. > In tests where all machines use the same electionPort this bug would not > manifest itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.