[jira] Updated: (ZOOKEEPER-710) permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster

2010-03-18 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-710:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed revision 925104.


> permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster
> --
>
> Key: ZOOKEEPER-710
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-710
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.2.2
> Environment: debian lenny; ia64; xen virtualization
>Reporter: Lukasz Osipiuk
>Assignee: Patrick Hunt
>Priority: Blocker
> Fix For: 3.2.3, 3.3.0
>
> Attachments: app1.log.2010-03-16.gz, app2.log.2010-03-16.gz, 
> ZOOKEEPER-710_3.2.patch, ZOOKEEPER-710_3.3.patch, 
> zookeeper-node1.log.2010-03-16.gz, zookeeper-node2.log.2010-03-16.gz, 
> zookeeper-node3.log.2010-03-16.gz
>
>
> Originally problem was described on Users mailing list starting with this 
> [post|http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201003.mbox/<3b910d891003160743k38e2e7c9y830b182d88396...@mail.gmail.com>].
> Below I restate it in more organized form.
> We occasionally (few times a day) observe that our client application 
> disconnects from Zookeeper cluster.
> Application is written in C++ and we are using libzookeeper_mt library. In 
> version 3.2.2.
> The disconnects we are observing are probably related to some problems with 
> our network infrastructure - we are observing periods with great packet loss 
> between machines in our DC. 
> Sometimes after client application (i.e. zookeeper library) reconnects to 
> zookeeper cluster we are observing that all subsequent requests return 
> ZSESSIONMOVED error. Restarting client app helps - we always pass 0 as 
> clientid to zookeeper_init function so old session is not reused.
> On 16-03-2010 we observed few occurences of problem. Example ones:
> - 22:08; client IP 10.1.112.60 (app1); sessionID 0x22767e1c963
> - 14:21; client IP 10.1.112.61 (app2); sessionID 0x324dcc1ba580085
> I attach logs of cluster and application nodes (only stuff concerining 
> zookeeper):
> - [^zookeeper-node1.log.2010-03-16.gz] - logs of zookeepr cluster node 1 
> 10.1.112.62
> - [^zookeeper-node2.log.2010-03-16.gz] - logs of zookeepr cluster node 2 
> 10.1.112.63
> - [^zookeeper-node3.log.2010-03-16.gz] - logs of zookeepr cluster node 3 
> 10.1.112.64
> - [^app1.log.2010-03-16.gz] - application logs of app1 10.1.112.60
> - [^app2.log.2010-03-16.gz] - application logs of app2 10.1.112.61
> I also made some analysis of case at 22:08:
> - Network glitch which resulted in problem occurred at about 22:08.
> - From what I see since 17:48 node2 was the leader and it did not
> change later yesterday.
> - Client was connected to node2 since 17:50
> - At around 22:09 client tried to connect to every node (1,2,3).
> Connections to node1 and node3 were closed
>  with exception "Exception causing close of session 0x22767e1c963
> due to java.io.IOException: Read error".
>  Connection to node2 stood alive.
> - All subsequent operations were refused with ZSESSIONMOVED error.
> Error visible both on client and on server side.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-710) permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster

2010-03-18 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-710:
---

Attachment: ZOOKEEPER-710_3.2.patch

this 3.2 patch is against the current 3.2 svn branch

fixes the issue in my tests


> permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster
> --
>
> Key: ZOOKEEPER-710
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-710
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.2.2
> Environment: debian lenny; ia64; xen virtualization
>Reporter: Lukasz Osipiuk
>Assignee: Patrick Hunt
>Priority: Blocker
> Fix For: 3.2.3, 3.3.0
>
> Attachments: app1.log.2010-03-16.gz, app2.log.2010-03-16.gz, 
> ZOOKEEPER-710_3.2.patch, ZOOKEEPER-710_3.3.patch, 
> zookeeper-node1.log.2010-03-16.gz, zookeeper-node2.log.2010-03-16.gz, 
> zookeeper-node3.log.2010-03-16.gz
>
>
> Originally problem was described on Users mailing list starting with this 
> [post|http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201003.mbox/<3b910d891003160743k38e2e7c9y830b182d88396...@mail.gmail.com>].
> Below I restate it in more organized form.
> We occasionally (few times a day) observe that our client application 
> disconnects from Zookeeper cluster.
> Application is written in C++ and we are using libzookeeper_mt library. In 
> version 3.2.2.
> The disconnects we are observing are probably related to some problems with 
> our network infrastructure - we are observing periods with great packet loss 
> between machines in our DC. 
> Sometimes after client application (i.e. zookeeper library) reconnects to 
> zookeeper cluster we are observing that all subsequent requests return 
> ZSESSIONMOVED error. Restarting client app helps - we always pass 0 as 
> clientid to zookeeper_init function so old session is not reused.
> On 16-03-2010 we observed few occurences of problem. Example ones:
> - 22:08; client IP 10.1.112.60 (app1); sessionID 0x22767e1c963
> - 14:21; client IP 10.1.112.61 (app2); sessionID 0x324dcc1ba580085
> I attach logs of cluster and application nodes (only stuff concerining 
> zookeeper):
> - [^zookeeper-node1.log.2010-03-16.gz] - logs of zookeepr cluster node 1 
> 10.1.112.62
> - [^zookeeper-node2.log.2010-03-16.gz] - logs of zookeepr cluster node 2 
> 10.1.112.63
> - [^zookeeper-node3.log.2010-03-16.gz] - logs of zookeepr cluster node 3 
> 10.1.112.64
> - [^app1.log.2010-03-16.gz] - application logs of app1 10.1.112.60
> - [^app2.log.2010-03-16.gz] - application logs of app2 10.1.112.61
> I also made some analysis of case at 22:08:
> - Network glitch which resulted in problem occurred at about 22:08.
> - From what I see since 17:48 node2 was the leader and it did not
> change later yesterday.
> - Client was connected to node2 since 17:50
> - At around 22:09 client tried to connect to every node (1,2,3).
> Connections to node1 and node3 were closed
>  with exception "Exception causing close of session 0x22767e1c963
> due to java.io.IOException: Read error".
>  Connection to node2 stood alive.
> - All subsequent operations were refused with ZSESSIONMOVED error.
> Error visible both on client and on server side.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-710) permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster

2010-03-18 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-710:


Hadoop Flags: [Reviewed]

+1 great work pat! and thanx Lukasz for identifying this failure condition.

> permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster
> --
>
> Key: ZOOKEEPER-710
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-710
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.2.2
> Environment: debian lenny; ia64; xen virtualization
>Reporter: Lukasz Osipiuk
>Assignee: Patrick Hunt
>Priority: Blocker
> Fix For: 3.2.3, 3.3.0
>
> Attachments: app1.log.2010-03-16.gz, app2.log.2010-03-16.gz, 
> ZOOKEEPER-710_3.3.patch, zookeeper-node1.log.2010-03-16.gz, 
> zookeeper-node2.log.2010-03-16.gz, zookeeper-node3.log.2010-03-16.gz
>
>
> Originally problem was described on Users mailing list starting with this 
> [post|http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201003.mbox/<3b910d891003160743k38e2e7c9y830b182d88396...@mail.gmail.com>].
> Below I restate it in more organized form.
> We occasionally (few times a day) observe that our client application 
> disconnects from Zookeeper cluster.
> Application is written in C++ and we are using libzookeeper_mt library. In 
> version 3.2.2.
> The disconnects we are observing are probably related to some problems with 
> our network infrastructure - we are observing periods with great packet loss 
> between machines in our DC. 
> Sometimes after client application (i.e. zookeeper library) reconnects to 
> zookeeper cluster we are observing that all subsequent requests return 
> ZSESSIONMOVED error. Restarting client app helps - we always pass 0 as 
> clientid to zookeeper_init function so old session is not reused.
> On 16-03-2010 we observed few occurences of problem. Example ones:
> - 22:08; client IP 10.1.112.60 (app1); sessionID 0x22767e1c963
> - 14:21; client IP 10.1.112.61 (app2); sessionID 0x324dcc1ba580085
> I attach logs of cluster and application nodes (only stuff concerining 
> zookeeper):
> - [^zookeeper-node1.log.2010-03-16.gz] - logs of zookeepr cluster node 1 
> 10.1.112.62
> - [^zookeeper-node2.log.2010-03-16.gz] - logs of zookeepr cluster node 2 
> 10.1.112.63
> - [^zookeeper-node3.log.2010-03-16.gz] - logs of zookeepr cluster node 3 
> 10.1.112.64
> - [^app1.log.2010-03-16.gz] - application logs of app1 10.1.112.60
> - [^app2.log.2010-03-16.gz] - application logs of app2 10.1.112.61
> I also made some analysis of case at 22:08:
> - Network glitch which resulted in problem occurred at about 22:08.
> - From what I see since 17:48 node2 was the leader and it did not
> change later yesterday.
> - Client was connected to node2 since 17:50
> - At around 22:09 client tried to connect to every node (1,2,3).
> Connections to node1 and node3 were closed
>  with exception "Exception causing close of session 0x22767e1c963
> due to java.io.IOException: Read error".
>  Connection to node2 stood alive.
> - All subsequent operations were refused with ZSESSIONMOVED error.
> Error visible both on client and on server side.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-710) permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster

2010-03-18 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-710:
---

Attachment: ZOOKEEPER-710_3.3.patch

fixes this session moved issue by closing the invalid connection

> permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster
> --
>
> Key: ZOOKEEPER-710
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-710
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.2.2
> Environment: debian lenny; ia64; xen virtualization
>Reporter: Lukasz Osipiuk
>Assignee: Patrick Hunt
>Priority: Blocker
> Fix For: 3.2.3, 3.3.0
>
> Attachments: app1.log.2010-03-16.gz, app2.log.2010-03-16.gz, 
> ZOOKEEPER-710_3.3.patch, zookeeper-node1.log.2010-03-16.gz, 
> zookeeper-node2.log.2010-03-16.gz, zookeeper-node3.log.2010-03-16.gz
>
>
> Originally problem was described on Users mailing list starting with this 
> [post|http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201003.mbox/<3b910d891003160743k38e2e7c9y830b182d88396...@mail.gmail.com>].
> Below I restate it in more organized form.
> We occasionally (few times a day) observe that our client application 
> disconnects from Zookeeper cluster.
> Application is written in C++ and we are using libzookeeper_mt library. In 
> version 3.2.2.
> The disconnects we are observing are probably related to some problems with 
> our network infrastructure - we are observing periods with great packet loss 
> between machines in our DC. 
> Sometimes after client application (i.e. zookeeper library) reconnects to 
> zookeeper cluster we are observing that all subsequent requests return 
> ZSESSIONMOVED error. Restarting client app helps - we always pass 0 as 
> clientid to zookeeper_init function so old session is not reused.
> On 16-03-2010 we observed few occurences of problem. Example ones:
> - 22:08; client IP 10.1.112.60 (app1); sessionID 0x22767e1c963
> - 14:21; client IP 10.1.112.61 (app2); sessionID 0x324dcc1ba580085
> I attach logs of cluster and application nodes (only stuff concerining 
> zookeeper):
> - [^zookeeper-node1.log.2010-03-16.gz] - logs of zookeepr cluster node 1 
> 10.1.112.62
> - [^zookeeper-node2.log.2010-03-16.gz] - logs of zookeepr cluster node 2 
> 10.1.112.63
> - [^zookeeper-node3.log.2010-03-16.gz] - logs of zookeepr cluster node 3 
> 10.1.112.64
> - [^app1.log.2010-03-16.gz] - application logs of app1 10.1.112.60
> - [^app2.log.2010-03-16.gz] - application logs of app2 10.1.112.61
> I also made some analysis of case at 22:08:
> - Network glitch which resulted in problem occurred at about 22:08.
> - From what I see since 17:48 node2 was the leader and it did not
> change later yesterday.
> - Client was connected to node2 since 17:50
> - At around 22:09 client tried to connect to every node (1,2,3).
> Connections to node1 and node3 were closed
>  with exception "Exception causing close of session 0x22767e1c963
> due to java.io.IOException: Read error".
>  Connection to node2 stood alive.
> - All subsequent operations were refused with ZSESSIONMOVED error.
> Error visible both on client and on server side.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-710) permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster

2010-03-18 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-710:
---

Status: Patch Available  (was: Open)

> permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster
> --
>
> Key: ZOOKEEPER-710
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-710
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.2.2
> Environment: debian lenny; ia64; xen virtualization
>Reporter: Lukasz Osipiuk
>Assignee: Patrick Hunt
>Priority: Blocker
> Fix For: 3.2.3, 3.3.0
>
> Attachments: app1.log.2010-03-16.gz, app2.log.2010-03-16.gz, 
> ZOOKEEPER-710_3.3.patch, zookeeper-node1.log.2010-03-16.gz, 
> zookeeper-node2.log.2010-03-16.gz, zookeeper-node3.log.2010-03-16.gz
>
>
> Originally problem was described on Users mailing list starting with this 
> [post|http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201003.mbox/<3b910d891003160743k38e2e7c9y830b182d88396...@mail.gmail.com>].
> Below I restate it in more organized form.
> We occasionally (few times a day) observe that our client application 
> disconnects from Zookeeper cluster.
> Application is written in C++ and we are using libzookeeper_mt library. In 
> version 3.2.2.
> The disconnects we are observing are probably related to some problems with 
> our network infrastructure - we are observing periods with great packet loss 
> between machines in our DC. 
> Sometimes after client application (i.e. zookeeper library) reconnects to 
> zookeeper cluster we are observing that all subsequent requests return 
> ZSESSIONMOVED error. Restarting client app helps - we always pass 0 as 
> clientid to zookeeper_init function so old session is not reused.
> On 16-03-2010 we observed few occurences of problem. Example ones:
> - 22:08; client IP 10.1.112.60 (app1); sessionID 0x22767e1c963
> - 14:21; client IP 10.1.112.61 (app2); sessionID 0x324dcc1ba580085
> I attach logs of cluster and application nodes (only stuff concerining 
> zookeeper):
> - [^zookeeper-node1.log.2010-03-16.gz] - logs of zookeepr cluster node 1 
> 10.1.112.62
> - [^zookeeper-node2.log.2010-03-16.gz] - logs of zookeepr cluster node 2 
> 10.1.112.63
> - [^zookeeper-node3.log.2010-03-16.gz] - logs of zookeepr cluster node 3 
> 10.1.112.64
> - [^app1.log.2010-03-16.gz] - application logs of app1 10.1.112.60
> - [^app2.log.2010-03-16.gz] - application logs of app2 10.1.112.61
> I also made some analysis of case at 22:08:
> - Network glitch which resulted in problem occurred at about 22:08.
> - From what I see since 17:48 node2 was the leader and it did not
> change later yesterday.
> - Client was connected to node2 since 17:50
> - At around 22:09 client tried to connect to every node (1,2,3).
> Connections to node1 and node3 were closed
>  with exception "Exception causing close of session 0x22767e1c963
> due to java.io.IOException: Read error".
>  Connection to node2 stood alive.
> - All subsequent operations were refused with ZSESSIONMOVED error.
> Error visible both on client and on server side.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-710) permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster

2010-03-18 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-710:
---

  Component/s: server
 Priority: Blocker  (was: Major)
Affects Version/s: 3.3.0
Fix Version/s: 3.3.0
   3.2.3
 Assignee: Patrick Hunt

Marking this to fix for 3.3.0 and 3.2.3


> permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster
> --
>
> Key: ZOOKEEPER-710
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-710
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.2.2
> Environment: debian lenny; ia64; xen virtualization
>Reporter: Lukasz Osipiuk
>Assignee: Patrick Hunt
>Priority: Blocker
> Fix For: 3.2.3, 3.3.0
>
> Attachments: app1.log.2010-03-16.gz, app2.log.2010-03-16.gz, 
> zookeeper-node1.log.2010-03-16.gz, zookeeper-node2.log.2010-03-16.gz, 
> zookeeper-node3.log.2010-03-16.gz
>
>
> Originally problem was described on Users mailing list starting with this 
> [post|http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201003.mbox/<3b910d891003160743k38e2e7c9y830b182d88396...@mail.gmail.com>].
> Below I restate it in more organized form.
> We occasionally (few times a day) observe that our client application 
> disconnects from Zookeeper cluster.
> Application is written in C++ and we are using libzookeeper_mt library. In 
> version 3.2.2.
> The disconnects we are observing are probably related to some problems with 
> our network infrastructure - we are observing periods with great packet loss 
> between machines in our DC. 
> Sometimes after client application (i.e. zookeeper library) reconnects to 
> zookeeper cluster we are observing that all subsequent requests return 
> ZSESSIONMOVED error. Restarting client app helps - we always pass 0 as 
> clientid to zookeeper_init function so old session is not reused.
> On 16-03-2010 we observed few occurences of problem. Example ones:
> - 22:08; client IP 10.1.112.60 (app1); sessionID 0x22767e1c963
> - 14:21; client IP 10.1.112.61 (app2); sessionID 0x324dcc1ba580085
> I attach logs of cluster and application nodes (only stuff concerining 
> zookeeper):
> - [^zookeeper-node1.log.2010-03-16.gz] - logs of zookeepr cluster node 1 
> 10.1.112.62
> - [^zookeeper-node2.log.2010-03-16.gz] - logs of zookeepr cluster node 2 
> 10.1.112.63
> - [^zookeeper-node3.log.2010-03-16.gz] - logs of zookeepr cluster node 3 
> 10.1.112.64
> - [^app1.log.2010-03-16.gz] - application logs of app1 10.1.112.60
> - [^app2.log.2010-03-16.gz] - application logs of app2 10.1.112.61
> I also made some analysis of case at 22:08:
> - Network glitch which resulted in problem occurred at about 22:08.
> - From what I see since 17:48 node2 was the leader and it did not
> change later yesterday.
> - Client was connected to node2 since 17:50
> - At around 22:09 client tried to connect to every node (1,2,3).
> Connections to node1 and node3 were closed
>  with exception "Exception causing close of session 0x22767e1c963
> due to java.io.IOException: Read error".
>  Connection to node2 stood alive.
> - All subsequent operations were refused with ZSESSIONMOVED error.
> Error visible both on client and on server side.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-710) permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster

2010-03-18 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-710:
---

Affects Version/s: (was: 3.3.0)

> permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster
> --
>
> Key: ZOOKEEPER-710
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-710
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.2.2
> Environment: debian lenny; ia64; xen virtualization
>Reporter: Lukasz Osipiuk
>Assignee: Patrick Hunt
>Priority: Blocker
> Fix For: 3.2.3, 3.3.0
>
> Attachments: app1.log.2010-03-16.gz, app2.log.2010-03-16.gz, 
> zookeeper-node1.log.2010-03-16.gz, zookeeper-node2.log.2010-03-16.gz, 
> zookeeper-node3.log.2010-03-16.gz
>
>
> Originally problem was described on Users mailing list starting with this 
> [post|http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201003.mbox/<3b910d891003160743k38e2e7c9y830b182d88396...@mail.gmail.com>].
> Below I restate it in more organized form.
> We occasionally (few times a day) observe that our client application 
> disconnects from Zookeeper cluster.
> Application is written in C++ and we are using libzookeeper_mt library. In 
> version 3.2.2.
> The disconnects we are observing are probably related to some problems with 
> our network infrastructure - we are observing periods with great packet loss 
> between machines in our DC. 
> Sometimes after client application (i.e. zookeeper library) reconnects to 
> zookeeper cluster we are observing that all subsequent requests return 
> ZSESSIONMOVED error. Restarting client app helps - we always pass 0 as 
> clientid to zookeeper_init function so old session is not reused.
> On 16-03-2010 we observed few occurences of problem. Example ones:
> - 22:08; client IP 10.1.112.60 (app1); sessionID 0x22767e1c963
> - 14:21; client IP 10.1.112.61 (app2); sessionID 0x324dcc1ba580085
> I attach logs of cluster and application nodes (only stuff concerining 
> zookeeper):
> - [^zookeeper-node1.log.2010-03-16.gz] - logs of zookeepr cluster node 1 
> 10.1.112.62
> - [^zookeeper-node2.log.2010-03-16.gz] - logs of zookeepr cluster node 2 
> 10.1.112.63
> - [^zookeeper-node3.log.2010-03-16.gz] - logs of zookeepr cluster node 3 
> 10.1.112.64
> - [^app1.log.2010-03-16.gz] - application logs of app1 10.1.112.60
> - [^app2.log.2010-03-16.gz] - application logs of app2 10.1.112.61
> I also made some analysis of case at 22:08:
> - Network glitch which resulted in problem occurred at about 22:08.
> - From what I see since 17:48 node2 was the leader and it did not
> change later yesterday.
> - Client was connected to node2 since 17:50
> - At around 22:09 client tried to connect to every node (1,2,3).
> Connections to node1 and node3 were closed
>  with exception "Exception causing close of session 0x22767e1c963
> due to java.io.IOException: Read error".
>  Connection to node2 stood alive.
> - All subsequent operations were refused with ZSESSIONMOVED error.
> Error visible both on client and on server side.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-710) permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster

2010-03-18 Thread Lukasz Osipiuk (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Osipiuk updated ZOOKEEPER-710:
-

Description: 
Originally problem was described on Users mailing list starting with this 
[post|http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201003.mbox/<3b910d891003160743k38e2e7c9y830b182d88396...@mail.gmail.com>].
Below I restate it in more organized form.

We occasionally (few times a day) observe that our client application 
disconnects from Zookeeper cluster.
Application is written in C++ and we are using libzookeeper_mt library. In 
version 3.2.2.

The disconnects we are observing are probably related to some problems with our 
network infrastructure - we are observing periods with great packet loss 
between machines in our DC. 

Sometimes after client application (i.e. zookeeper library) reconnects to 
zookeeper cluster we are observing that all subsequent requests return 
ZSESSIONMOVED error. Restarting client app helps - we always pass 0 as clientid 
to zookeeper_init function so old session is not reused.

On 16-03-2010 we observed few occurences of problem. Example ones:
- 22:08; client IP 10.1.112.60 (app1); sessionID 0x22767e1c963
- 14:21; client IP 10.1.112.61 (app2); sessionID 0x324dcc1ba580085

I attach logs of cluster and application nodes (only stuff concerining 
zookeeper):
- [^zookeeper-node1.log.2010-03-16.gz] - logs of zookeepr cluster node 1 
10.1.112.62
- [^zookeeper-node2.log.2010-03-16.gz] - logs of zookeepr cluster node 2 
10.1.112.63
- [^zookeeper-node3.log.2010-03-16.gz] - logs of zookeepr cluster node 3 
10.1.112.64
- [^app1.log.2010-03-16.gz] - application logs of app1 10.1.112.60
- [^app2.log.2010-03-16.gz] - application logs of app2 10.1.112.61

I also made some analysis of case at 22:08:
- Network glitch which resulted in problem occurred at about 22:08.
- From what I see since 17:48 node2 was the leader and it did not
change later yesterday.
- Client was connected to node2 since 17:50
- At around 22:09 client tried to connect to every node (1,2,3).
Connections to node1 and node3 were closed
 with exception "Exception causing close of session 0x22767e1c963
due to java.io.IOException: Read error".
 Connection to node2 stood alive.
- All subsequent operations were refused with ZSESSIONMOVED error.
Error visible both on client and on server side.



  was:
Originally problem was described on Users mailing list starting with this 
[post|http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201003.mbox/<3b910d891003160743k38e2e7c9y830b182d88396...@mail.gmail.com>].
Below I restate it in more organized form.

We occasionally (few times a day) observe that our client application 
disconnects from Zookeeper cluster.
Application is written in C++ and we are using libzookeeper_mt library. In 
version 3.2.2.

The disconnects we are observing are probably related to some problems with our 
network infrastructure - we are observing periods with great packet loss 
between machines in our DC. 

Sometimes after client application (i.e. zookeeper library) reconnects to 
zookeeper cluster we are observing that all subsequent requests return 
ZSESSIONMOVED error. Restarting client app helps - we always pass 0 as clientid 
to zookeeper_init function so old session is not reused.

On 16-03-2010 we observed few occurences of problem. Example ones:
- 22:08; client IP 10.1.112.60 (app1); sessionID 0x22767e1c963
- 14:21; client IP 10.1.112.61 (app2); sessionID 0x324dcc1ba580085

I attach logs of cluster and application nodes (only stuff concerining 
zookeeper):
- zookeeper-node1.log.2010-03-16.gz
- zookeeper-node2.log.2010-03-16.gz
- zookeeper-node3.log.2010-03-16.gz
- app1.log.2010-03-16.gz
- app2.log.2010-03-16.gz

I also made some analysis of case at 22:08:
- Network glitch which resulted in problem occurred at about 22:08.
- From what I see since 17:48 node2 was the leader and it did not
change later yesterday.
- Client was connected to node2 since 17:50
- At around 22:09 client tried to connect to every node (1,2,3).
Connections to node1 and node3 were closed
 with exception "Exception causing close of session 0x22767e1c963
due to java.io.IOException: Read error".
 Connection to node2 stood alive.
- All subsequent operations were refused with ZSESSIONMOVED error.
Error visible both on client and on server side.




> permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster
> --
>
> Key: ZOOKEEPER-710
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-710
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.2
> Environment: debian lenny; ia64; xen virtualization
>Reporter: Lukasz Osipiuk
> Attachments: app1.log.2010-03-16.gz, app2.log.2010-03-16.gz, 

[jira] Updated: (ZOOKEEPER-710) permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster

2010-03-18 Thread Lukasz Osipiuk (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Osipiuk updated ZOOKEEPER-710:
-

  Description: 
Originally problem was described on Users mailing list starting with this 
[post|http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201003.mbox/<3b910d891003160743k38e2e7c9y830b182d88396...@mail.gmail.com>].
Below I restate it in more organized form.

We occasionally (few times a day) observe that our client application 
disconnects from Zookeeper cluster.
Application is written in C++ and we are using libzookeeper_mt library. In 
version 3.2.2.

The disconnects we are observing are probably related to some problems with our 
network infrastructure - we are observing periods with great packet loss 
between machines in our DC. 

Sometimes after client application (i.e. zookeeper library) reconnects to 
zookeeper cluster we are observing that all subsequent requests return 
ZSESSIONMOVED error. Restarting client app helps - we always pass 0 as clientid 
to zookeeper_init function so old session is not reused.

On 16-03-2010 we observed few occurences of problem. Example ones:
- 22:08; client IP 10.1.112.60 (app1); sessionID 0x22767e1c963
- 14:21; client IP 10.1.112.61 (app2); sessionID 0x324dcc1ba580085

I attach logs of cluster and application nodes (only stuff concerining 
zookeeper):
- zookeeper-node1.log.2010-03-16.gz
- zookeeper-node2.log.2010-03-16.gz
- zookeeper-node3.log.2010-03-16.gz
- app1.log.2010-03-16.gz
- app2.log.2010-03-16.gz

I also made some analysis of case at 22:08:
- Network glitch which resulted in problem occurred at about 22:08.
- From what I see since 17:48 node2 was the leader and it did not
change later yesterday.
- Client was connected to node2 since 17:50
- At around 22:09 client tried to connect to every node (1,2,3).
Connections to node1 and node3 were closed
 with exception "Exception causing close of session 0x22767e1c963
due to java.io.IOException: Read error".
 Connection to node2 stood alive.
- All subsequent operations were refused with ZSESSIONMOVED error.
Error visible both on client and on server side.



  was:
Originally problem was described on Users mailing list starting with this 
[post|http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201003.mbox/<3b910d891003160743k38e2e7c9y830b182d88396...@mail.gmail.com>].
Below I restate it in more organized form.

We occasionally (few times a day) observe that our client application 
disconnects from Zookeeper cluster.
Application is written in C++ and we are using libzookeeper_mt library. In 
version 3.2.2.

The disconnects we are observing are probably related to some problems with our 
network infrastructure - we are observing periods with great packet loss 
between machines in our DC. 

Sometimes after client application (i.e. zookeeper library) reconnects to 
zookeeper cluster we are observing that all subsequent requests return 
ZSESSIONMOVED error. Restarting client app helps - we always pass 0 as clientid 
to zookeeper_init function so old session is not reused.

On 16-03-2010 we observed few occurences of problem. Example ones:
- 22:08; client IP 10.1.112.60 (app1)
- 14:21; client IP 10.1.112.61 (app2)

I attach logs of cluster and application nodes (only stuff concerining 
zookeeper):
- zookeeper-node1.log.2010-03-16.gz
- zookeeper-node2.log.2010-03-16.gz
- zookeeper-node3.log.2010-03-16.gz
- app1.log.2010-03-16.gz
- app2.log.2010-03-16.gz

I also made some analysis of case at 22:08:
- Network glitch which resulted in problem occurred at about 22:08.
- From what I see since 17:48 node2 was the leader and it did not
change later yesterday.
- Client was connected to node2 since 17:50
- At around 22:09 client tried to connect to every node (1,2,3).
Connections to node1 and node3 were closed
 with exception "Exception causing close of session 0x22767e1c963
due to java.io.IOException: Read error".
 Connection to node2 stood alive.
- All subsequent operations were refused with ZSESSIONMOVED error.
Error visible both on client and on server side.



Affects Version/s: 3.2.2

> permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster
> --
>
> Key: ZOOKEEPER-710
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-710
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.2
> Environment: debian lenny; ia64; xen virtualization
>Reporter: Lukasz Osipiuk
> Attachments: app1.log.2010-03-16.gz, app2.log.2010-03-16.gz, 
> zookeeper-node1.log.2010-03-16.gz, zookeeper-node2.log.2010-03-16.gz, 
> zookeeper-node3.log.2010-03-16.gz
>
>
> Originally problem was described on Users mailing list starting with this 
> [post|http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-u

[jira] Updated: (ZOOKEEPER-710) permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster

2010-03-18 Thread Lukasz Osipiuk (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Osipiuk updated ZOOKEEPER-710:
-

Attachment: app2.log.2010-03-16.gz

> permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster
> --
>
> Key: ZOOKEEPER-710
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-710
> Project: Zookeeper
>  Issue Type: Bug
> Environment: debian lenny; ia64; xen virtualization
>Reporter: Lukasz Osipiuk
> Attachments: app1.log.2010-03-16.gz, app2.log.2010-03-16.gz, 
> zookeeper-node1.log.2010-03-16.gz, zookeeper-node2.log.2010-03-16.gz, 
> zookeeper-node3.log.2010-03-16.gz
>
>
> Originally problem was described on Users mailing list starting with this 
> [post|http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201003.mbox/<3b910d891003160743k38e2e7c9y830b182d88396...@mail.gmail.com>].
> Below I restate it in more organized form.
> We occasionally (few times a day) observe that our client application 
> disconnects from Zookeeper cluster.
> Application is written in C++ and we are using libzookeeper_mt library. In 
> version 3.2.2.
> The disconnects we are observing are probably related to some problems with 
> our network infrastructure - we are observing periods with great packet loss 
> between machines in our DC. 
> Sometimes after client application (i.e. zookeeper library) reconnects to 
> zookeeper cluster we are observing that all subsequent requests return 
> ZSESSIONMOVED error. Restarting client app helps - we always pass 0 as 
> clientid to zookeeper_init function so old session is not reused.
> On 16-03-2010 we observed few occurences of problem. Example ones:
> - 22:08; client IP 10.1.112.60 (app1)
> - 14:21; client IP 10.1.112.61 (app2)
> I attach logs of cluster and application nodes (only stuff concerining 
> zookeeper):
> - zookeeper-node1.log.2010-03-16.gz
> - zookeeper-node2.log.2010-03-16.gz
> - zookeeper-node3.log.2010-03-16.gz
> - app1.log.2010-03-16.gz
> - app2.log.2010-03-16.gz
> I also made some analysis of case at 22:08:
> - Network glitch which resulted in problem occurred at about 22:08.
> - From what I see since 17:48 node2 was the leader and it did not
> change later yesterday.
> - Client was connected to node2 since 17:50
> - At around 22:09 client tried to connect to every node (1,2,3).
> Connections to node1 and node3 were closed
>  with exception "Exception causing close of session 0x22767e1c963
> due to java.io.IOException: Read error".
>  Connection to node2 stood alive.
> - All subsequent operations were refused with ZSESSIONMOVED error.
> Error visible both on client and on server side.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-710) permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster

2010-03-18 Thread Lukasz Osipiuk (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Osipiuk updated ZOOKEEPER-710:
-

Attachment: app1.log.2010-03-16.gz

> permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster
> --
>
> Key: ZOOKEEPER-710
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-710
> Project: Zookeeper
>  Issue Type: Bug
> Environment: debian lenny; ia64; xen virtualization
>Reporter: Lukasz Osipiuk
> Attachments: app1.log.2010-03-16.gz, 
> zookeeper-node1.log.2010-03-16.gz, zookeeper-node2.log.2010-03-16.gz, 
> zookeeper-node3.log.2010-03-16.gz
>
>
> Originally problem was described on Users mailing list starting with this 
> [post|http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201003.mbox/<3b910d891003160743k38e2e7c9y830b182d88396...@mail.gmail.com>].
> Below I restate it in more organized form.
> We occasionally (few times a day) observe that our client application 
> disconnects from Zookeeper cluster.
> Application is written in C++ and we are using libzookeeper_mt library. In 
> version 3.2.2.
> The disconnects we are observing are probably related to some problems with 
> our network infrastructure - we are observing periods with great packet loss 
> between machines in our DC. 
> Sometimes after client application (i.e. zookeeper library) reconnects to 
> zookeeper cluster we are observing that all subsequent requests return 
> ZSESSIONMOVED error. Restarting client app helps - we always pass 0 as 
> clientid to zookeeper_init function so old session is not reused.
> On 16-03-2010 we observed few occurences of problem. Example ones:
> - 22:08; client IP 10.1.112.60 (app1)
> - 14:21; client IP 10.1.112.61 (app2)
> I attach logs of cluster and application nodes (only stuff concerining 
> zookeeper):
> - zookeeper-node1.log.2010-03-16.gz
> - zookeeper-node2.log.2010-03-16.gz
> - zookeeper-node3.log.2010-03-16.gz
> - app1.log.2010-03-16.gz
> - app2.log.2010-03-16.gz
> I also made some analysis of case at 22:08:
> - Network glitch which resulted in problem occurred at about 22:08.
> - From what I see since 17:48 node2 was the leader and it did not
> change later yesterday.
> - Client was connected to node2 since 17:50
> - At around 22:09 client tried to connect to every node (1,2,3).
> Connections to node1 and node3 were closed
>  with exception "Exception causing close of session 0x22767e1c963
> due to java.io.IOException: Read error".
>  Connection to node2 stood alive.
> - All subsequent operations were refused with ZSESSIONMOVED error.
> Error visible both on client and on server side.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-710) permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster

2010-03-18 Thread Lukasz Osipiuk (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Osipiuk updated ZOOKEEPER-710:
-

Attachment: zookeeper-node3.log.2010-03-16.gz
zookeeper-node2.log.2010-03-16.gz

> permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster
> --
>
> Key: ZOOKEEPER-710
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-710
> Project: Zookeeper
>  Issue Type: Bug
> Environment: debian lenny; ia64; xen virtualization
>Reporter: Lukasz Osipiuk
> Attachments: app1.log.2010-03-16.gz, 
> zookeeper-node1.log.2010-03-16.gz, zookeeper-node2.log.2010-03-16.gz, 
> zookeeper-node3.log.2010-03-16.gz
>
>
> Originally problem was described on Users mailing list starting with this 
> [post|http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201003.mbox/<3b910d891003160743k38e2e7c9y830b182d88396...@mail.gmail.com>].
> Below I restate it in more organized form.
> We occasionally (few times a day) observe that our client application 
> disconnects from Zookeeper cluster.
> Application is written in C++ and we are using libzookeeper_mt library. In 
> version 3.2.2.
> The disconnects we are observing are probably related to some problems with 
> our network infrastructure - we are observing periods with great packet loss 
> between machines in our DC. 
> Sometimes after client application (i.e. zookeeper library) reconnects to 
> zookeeper cluster we are observing that all subsequent requests return 
> ZSESSIONMOVED error. Restarting client app helps - we always pass 0 as 
> clientid to zookeeper_init function so old session is not reused.
> On 16-03-2010 we observed few occurences of problem. Example ones:
> - 22:08; client IP 10.1.112.60 (app1)
> - 14:21; client IP 10.1.112.61 (app2)
> I attach logs of cluster and application nodes (only stuff concerining 
> zookeeper):
> - zookeeper-node1.log.2010-03-16.gz
> - zookeeper-node2.log.2010-03-16.gz
> - zookeeper-node3.log.2010-03-16.gz
> - app1.log.2010-03-16.gz
> - app2.log.2010-03-16.gz
> I also made some analysis of case at 22:08:
> - Network glitch which resulted in problem occurred at about 22:08.
> - From what I see since 17:48 node2 was the leader and it did not
> change later yesterday.
> - Client was connected to node2 since 17:50
> - At around 22:09 client tried to connect to every node (1,2,3).
> Connections to node1 and node3 were closed
>  with exception "Exception causing close of session 0x22767e1c963
> due to java.io.IOException: Read error".
>  Connection to node2 stood alive.
> - All subsequent operations were refused with ZSESSIONMOVED error.
> Error visible both on client and on server side.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-710) permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster

2010-03-18 Thread Lukasz Osipiuk (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Osipiuk updated ZOOKEEPER-710:
-

Attachment: zookeeper-node1.log.2010-03-16.gz

> permanent ZSESSIONMOVED error after client app reconnects to zookeeper cluster
> --
>
> Key: ZOOKEEPER-710
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-710
> Project: Zookeeper
>  Issue Type: Bug
> Environment: debian lenny; ia64; xen virtualization
>Reporter: Lukasz Osipiuk
> Attachments: app1.log.2010-03-16.gz, 
> zookeeper-node1.log.2010-03-16.gz, zookeeper-node2.log.2010-03-16.gz, 
> zookeeper-node3.log.2010-03-16.gz
>
>
> Originally problem was described on Users mailing list starting with this 
> [post|http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201003.mbox/<3b910d891003160743k38e2e7c9y830b182d88396...@mail.gmail.com>].
> Below I restate it in more organized form.
> We occasionally (few times a day) observe that our client application 
> disconnects from Zookeeper cluster.
> Application is written in C++ and we are using libzookeeper_mt library. In 
> version 3.2.2.
> The disconnects we are observing are probably related to some problems with 
> our network infrastructure - we are observing periods with great packet loss 
> between machines in our DC. 
> Sometimes after client application (i.e. zookeeper library) reconnects to 
> zookeeper cluster we are observing that all subsequent requests return 
> ZSESSIONMOVED error. Restarting client app helps - we always pass 0 as 
> clientid to zookeeper_init function so old session is not reused.
> On 16-03-2010 we observed few occurences of problem. Example ones:
> - 22:08; client IP 10.1.112.60 (app1)
> - 14:21; client IP 10.1.112.61 (app2)
> I attach logs of cluster and application nodes (only stuff concerining 
> zookeeper):
> - zookeeper-node1.log.2010-03-16.gz
> - zookeeper-node2.log.2010-03-16.gz
> - zookeeper-node3.log.2010-03-16.gz
> - app1.log.2010-03-16.gz
> - app2.log.2010-03-16.gz
> I also made some analysis of case at 22:08:
> - Network glitch which resulted in problem occurred at about 22:08.
> - From what I see since 17:48 node2 was the leader and it did not
> change later yesterday.
> - Client was connected to node2 since 17:50
> - At around 22:09 client tried to connect to every node (1,2,3).
> Connections to node1 and node3 were closed
>  with exception "Exception causing close of session 0x22767e1c963
> due to java.io.IOException: Read error".
>  Connection to node2 stood alive.
> - All subsequent operations were refused with ZSESSIONMOVED error.
> Error visible both on client and on server side.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.