I found the problem is caused by my configuration. When set "regions" in hw_server.conf, the local region should NOT be included.
The problem went way after setting "regions=X.Y.Z.114:4080" for the first region (reg1) and "regions=X.Y.Z.111:4080" for the second region (reg2). Thanks, Ming On Tue, Jan 13, 2015 at 4:38 AM, Ivan Kelly <[email protected]> wrote: > Hi Ming, > > This looks like a bug. Feel free to dig in and try and fix it :) > > The cross region stuff in hedwig was never tested extensively, so there's > probably quite a few bugs in there. > > Regards > Ivan > > On Mon, Jan 12, 2015 at 7:42 PM, Ming Chen <[email protected]> > wrote: > >> FYI, the cross-region communication is working now after I used the >> latest code from git and enabled SSL in conf. >> >> Even though there seems to be an infinite loop when I do "sub mytopic >> myid1-1 2" in "hedwig console": >> [hedwig: (reg1) 164] sub mytopic myid1-1 2 >> SUB DONE AND RECEIVE >> Finished 0.031 s. >> [hedwig: (reg1) 165] Received message from topic mytopic for subscriber >> myid1-1 : neeeeew-msg-from-reg2 >> Received message from topic mytopic for subscriber myid1-1 : mysg-1-2 >> Received message from topic mytopic for subscriber myid1-1 : >> abs-new-msg-from-reg1 >> Received message from topic mytopic for subscriber myid1-1 : mysg-1-2 >> Received message from topic mytopic for subscriber myid1-1 : >> neeeeew-msg-from-reg2 >> Received message from topic mytopic for subscriber myid1-1 : msg-2-1 >> Received message from topic mytopic for subscriber myid1-1 : >> abs-new-msg-from-reg1 >> Received message from topic mytopic for subscriber myid1-1 : mysg-1-2 >> Received message from topic mytopic for subscriber myid1-1 : >> neeeeew-msg-from-reg2 >> ... >> >> Thanks, >> Ming >> >> On Thu, Jan 8, 2015 at 11:24 AM, Ming Chen <[email protected]> >> wrote: >> >>> Hi Ivan, >>> >>> Thanks for the heads-up. Sorry that I didn't make it clear, but I did >>> set the region option in hw_server.conf to "reg1" and "reg2" for the two >>> regions, respectively. >>> >>> I tried some more experiments, and got some error message with the >>> following operations on just one region: >>> (1) format >>> (2) show topics # it throws an IOException, which is probably okay as we >>> did not have any topic to show >>> (3) pub mytopic1 hello-topic1 >>> (4) sub mytopic1 myid1 2 >>> >>> [hedwig: (reg1) 88] format >>> You ask to format hedwig metadata stored in >>> org.apache.hedwig.server.meta.ZkMetadataManagerFactory. >>> Press <Return> to continue, or Q to cancel ... >>> 2015-01-08 00:09:45,752 - INFO - [main:HedwigAdmin@541] - Formatted >>> Hedwig metadata successfully. >>> 2015-01-08 00:09:45,757 - INFO - [main:HedwigAdmin@544] - Removed old >>> factory layout. >>> 2015-01-08 00:09:45,770 - INFO - [main:HedwigAdmin@548] - Created new >>> factory layout. >>> Formatted hedwig metadata successfully. >>> Finished 2.352 s. >>> [hedwig: (reg1) 89] show topics >>> Unable to fetch the list of topics >>> java.io.IOException: Failed to get topics list : >>> at >>> org.apache.hedwig.server.meta.ZkMetadataManagerFactory.getTopics(ZkMetadataManagerFactory.java:98) >>> at >>> org.apache.hedwig.admin.HedwigAdmin.getTopics(HedwigAdmin.java:331) >>> at >>> org.apache.hedwig.admin.console.HedwigConsole$ShowCmd.showTopics(HedwigConsole.java:588) >>> at >>> org.apache.hedwig.admin.console.HedwigConsole$ShowCmd.runCmd(HedwigConsole.java:564) >>> at >>> org.apache.hedwig.admin.console.HedwigConsole.processCmd(HedwigConsole.java:966) >>> at >>> org.apache.hedwig.admin.console.HedwigConsole.executeLine(HedwigConsole.java:937) >>> at >>> org.apache.hedwig.admin.console.HedwigConsole.run(HedwigConsole.java:1021) >>> at >>> org.apache.hedwig.admin.console.HedwigConsole.main(HedwigConsole.java:1036) >>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: >>> KeeperErrorCode = NoNode for /hedwig/reg1/topics >>> at >>> org.apache.zookeeper.KeeperException.create(KeeperException.java:111) >>> at >>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51) >>> at >>> org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472) >>> at >>> org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500) >>> at >>> org.apache.hedwig.server.meta.ZkMetadataManagerFactory.getTopics(ZkMetadataManagerFactory.java:96) >>> ... 7 more >>> Finished 0.015 s. >>> [hedwig: (reg1) 90] pub mytopic1 hello-topic1 >>> PUB DONE >>> Finished 0.472 s. >>> [hedwig: (reg1) 91] sub mytopic1 myid1 2 >>> 2015-01-08 00:13:38,021 - INFO - [New I/O worker #6:HChannelHandler@228] >>> - Channel [id: 0x50aa85e6, /127.0.0.1:52095 :> localhost/127.0.0.1:4080] >>> was disconnected to host localhost/1 >>> 27.0.0.1:4080. >>> 2015-01-08 00:13:38,022 - INFO - [New I/O worker >>> #6:AbstractHChannelManager@357] - NonSubscription Channel [id: >>> 0x50aa85e6, /127.0.0.1:52095 :> localhost/127.0.0.1:4080] to localhost >>> /127.0.0.1:4080 disconnected. >>> 2015-01-08 00:13:38,030 - INFO - [New I/O worker #7:HChannelHandler@228] >>> - Channel [id: 0x9615a67b, /127.0.0.1:52098 :> localhost/127.0.0.1:4080] >>> was disconnected to host localhost/1 >>> 27.0.0.1:4080. >>> 2015-01-08 00:13:38,031 - INFO - [New I/O worker >>> #7:SimpleHChannelManager@191] - Subscription Channel [id: 0x9615a67b, / >>> 127.0.0.1:52098 :> localhost/127.0.0.1:4080] disconnected from >>> localhost/127.0.0.1:4080. >>> 2015-01-08 00:13:38,037 - ERROR - [main:HedwigSubscriber@130] - >>> Unexpected PubSubException thrown: >>> org.apache.hedwig.exceptions.PubSubException$UncertainStateException: >>> Server ack response never received before server connection disconnected! >>> at >>> org.apache.hedwig.client.netty.impl.HChannelHandler.channelDisconnected(HChannelHandler.java:252) >>> at >>> org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:120) >>> at >>> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) >>> at >>> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) >>> at >>> org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:60) >>> at >>> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) >>> at >>> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) >>> at >>> org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493) >>> at >>> org.jboss.netty.handler.codec.frame.FrameDecoder.channelDisconnected(FrameDecoder.java:365) >>> at >>> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:102) >>> at >>> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) >>> at >>> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) >>> at >>> org.jboss.netty.channel.Channels.fireChannelDisconnected(Channels.java:396) >>> at >>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:360) >>> at >>> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93) >>> at >>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) >>> at >>> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) >>> at >>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) >>> at >>> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) >>> at >>> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) >>> at >>> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>> at java.lang.Thread.run(Thread.java:745) >>> SUB FAILED >>> org.apache.hedwig.exceptions.PubSubException$ServiceDownException: >>> org.apache.hedwig.exceptions.PubSubException$UncertainStateException: >>> Server ack response never received before server connection disconnected! >>> at >>> org.apache.hedwig.client.netty.HedwigSubscriber.subUnsub(HedwigSubscriber.java:133) >>> at >>> org.apache.hedwig.client.netty.HedwigSubscriber.subscribe(HedwigSubscriber.java:194) >>> at >>> org.apache.hedwig.client.netty.HedwigSubscriber.subscribe(HedwigSubscriber.java:181) >>> at >>> org.apache.hedwig.admin.console.HedwigConsole$SubCmd.runCmd(HedwigConsole.java:291) >>> at >>> org.apache.hedwig.admin.console.HedwigConsole.processCmd(HedwigConsole.java:966) >>> at >>> org.apache.hedwig.admin.console.HedwigConsole.executeLine(HedwigConsole.java:937) >>> at >>> org.apache.hedwig.admin.console.HedwigConsole.run(HedwigConsole.java:1021) >>> at >>> org.apache.hedwig.admin.console.HedwigConsole.main(HedwigConsole.java:1036) >>> Caused by: >>> org.apache.hedwig.exceptions.PubSubException$UncertainStateException: >>> Server ack response never received before server connection disconnected! >>> at >>> org.apache.hedwig.client.netty.impl.HChannelHandler.channelDisconnected(HChannelHandler.java:252) >>> at >>> org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:120) >>> at >>> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) >>> at >>> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) >>> at >>> org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:60) >>> at >>> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) >>> at >>> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) >>> at >>> org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493) >>> at >>> org.jboss.netty.handler.codec.frame.FrameDecoder.channelDisconnected(FrameDecoder.java:365) >>> at >>> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:102) >>> at >>> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) >>> at >>> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) >>> at >>> org.jboss.netty.channel.Channels.fireChannelDisconnected(Channels.java:396) >>> at >>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:360) >>> at >>> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93) >>> at >>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) >>> at >>> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) >>> at >>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) >>> at >>> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) >>> at >>> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) >>> at >>> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>> at java.lang.Thread.run(Thread.java:745) >>> >>> Thanks, >>> Ming >>> >>> >>> On Thu, Jan 8, 2015 at 6:05 AM, Ivan Kelly <[email protected]> wrote: >>> > Hi Ming, >>> > >>> > It's been a long time since I looked at the region stuff in hedwig, >>> but I >>> > think it could be that you don't seem to be setting the region >>> identifier in >>> > hw_server.conf. You need to change "region" in hw_server to some >>> identifier, >>> > like reg1 and reg2 for your example. >>> > >>> > Hope this helps, >>> > Ivan >>> > >>> >>> >> >
