I remember someone post the same issue yesterday. The problem is that your host 
is somehow resolved as "Netty-Client-*", which is not pingable. You may modify 
/etc/hosts to map the hostnames to IP addresses appropriately if it is allowed.



—
Sincerely,
Fan Jiang

On Thu, Jun 25, 2015 at 3:57 PM, Nick R. Katsipoulakis
<[email protected]> wrote:

> Hello all,
> I apologize for the long message, but I have no idea what is going wrong in
> my setup and I tried to give a lot of info about my cluster. I have the
> following EC2 setup:
> 1) 3x m4.xlarge nodes for a 3-node ZooKeeper ensemble and a nimbus
> 2) 4x m4.xlarge nodes for my Supervisors.
> All of the machines are running Ubuntu Linux v14, OpenJDK v1.7 and Apache
> Storm v0.9.4. The storm.yaml I am currently having in the nimbus node
> (only) has the following values:
> storm.home: "/opt/apache-storm-0.9.4"
> storm.local.dir: "/mnt/storm"
> storm.zookeeper.servers:
>      - "172.31.28.73"
>      - "172.31.38.251"
>      - "172.31.38.252"
> storm.zookeeper.port: 2181
> storm.zookeeper.root: "/storm"
> storm.zookeeper.session.timeout: 20000
> storm.zookeeper.connection.timeout: 15000
> storm.zookeeper.retry.times: 5
> storm.zookeeper.retry.interval: 1000
> storm.zookeeper.retry.invervalceiling.millis: 30000
> storm.cluster.mode: "distributed"
> storm.local.mode.zmq: false
> storm.thrift.transport: "backtype.storm.security.auth.SimpleTransportPlugin"
> storm.messaging.transport: "backtype.storm.messaging.netty.Context"
> nimbus.host: "127.0.0.1"
> nimbus.thrift.port: 6627
> nimbus.thrift.max_buffer_size: 1048576
> nimbus.thrift.threads: 256
> nimbus.childopts: "-Xmx256m"
> nimbus.task.timeout.secs: 30
> nimbus.supervisor.timeout.secs: 60
> nimbus.monitor.freq.secs: 10
> nimbus.cleanup.inbox.freq.secs: 600
> nimbus.inbox.jar.expiration.secs: 3600
> nimbus.task.launch.secs: 120
> nimbus.reassign: true
> nimbus.file.copy.expiration.secs: 600
> nimbus.topology.validator: "backtype.storm.nimbus.DefaultTopologyValidator"
> ui.port: 8080
> ui.childopts: "-Xmx768m"
> logviewer.port: 8000
> logviewer.childopts: "-Xmx256m"
> logviewer.appender.name: "A1"
> drpc.port: 3772
> drpc.worker.threads: 64
> drpc.queue.size: 128
> drpc.invocations.port: 3773
> drpc.request.timeout.secs: 600
> drpc.childopts: "-Xmx768m"
> transactional.zookeeper.root: "/transactional"
> transactional.zookeeper.servers: null
> transactional.zookeeper.port: null
> supervisor.slots.ports:
>      - 6700
>      - 6701
>      - 6702
>      - 6703
> supervisor.childopts: "-Xmx256m"
> supervisor.worker.start.timeout.secs: 120
> supervisor.worker.timeout.secs: 30
> supervisor.monitor.frequency.secs: 3
> supervisor.heartbeat.frequency.secs: 5
> supervisor.enable: true
> worker.childopts: "-Xmx4096m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:NewSize=128m
> -XX:CMSInitiatingOccupancyFraction=70 -XX: -CMSConcurrentMTEnabled
> -Djava.net.preferIPv4Stack=true"
> worker.heartbeat.frequency.secs: 1
> task.heartbeat.frequency.secs: 3
> task.refresh.poll.secs: 10
> zmq.threads: 1
> zmq.linger.millis: 5000
> zmq.hwm: 0
> storm.messaging.netty.server_worker_threads: 4
> storm.messaging.netty.client_worker_threads: 4
> storm.messaging.netty.buffer_size: 10485760
> storm.messaging.netty.max_retries: 100
> storm.messaging.netty.max_wait_ms: 1000
> storm.messaging.netty.min_wait_ms: 100
> topology.enable.message.timeouts: true
> topology.debug: false
> topology.optimize: true
> topology.workers: 1
> topology.acker.executors: null
> topology.tasks: null
> topology.message.timeout.secs: 30
> topology.skip.missing.kryo.registrations: false
> topology.max.task.parallelism: null
> topology.max.spout.pending: null
> topology.state.synchronization.timeout.secs: 60
> topology.stats.sample.rate: 0.05
> topology.builtin.metrics.bucket.size.secs: 60
> topology.fall.back.on.java.serialization: true
> topology.worker.childopts: null
> topology.executor.receive.buffer.size: 1024
> topology.executor.send.buffer.size: 1024
> topology.receiver.buffer.size: 8
> topology.transfer.buffer.size: 1024
> topology.tick.tuple.freq.secs: null
> topology.worker.shared.thread.pool.size: 4
> topology.disruptor.wait.strategy: "com.lmax.disruptor.BlockingWaitStrategy"
> topology.spout.wait.strategy: "backtype.storm.spout.SleepSpoutWaitStrategy"
> topology.sleep.spout.wait.strategy.time.ms: 1
> topology.error.throttle.interval.secs: 10
> topology.max.error.report.per.interval: 5
> topology.kryo.factory: "backtype.storm.serialization.DefaultKryoFactory"
> topology.tuple.serializer:
> "backtype.storm.serialization.types.ListDelegateSerializer"
> topology.trident.batch.emit.interval.millis: 500
> dev.zookeeper.path: "/tmp/dev-storm-zookeeper"
> The problem is that every time I submit a topology, I got a lot of Netty
> messages in my worker logs (found in the supervisor machines) and
> many of them had similar to the following messages:
> 2015-06-25T19:42:32.534+0000 b.s.u.StormBoundedExponentialBackoffRetry
> [INFO] The baseSleepTimeMs [1000] the maxSleepTimeMs [30000] the maxRetries
> [5]
> 2015-06-25T19:42:32.625+0000 o.a.s.c.f.i.CuratorFrameworkImpl [INFO]
> Starting
> 2015-06-25T19:42:32.629+0000 o.a.s.z.ZooKeeper [INFO] Initiating client
> connection, connectString=172.31.28.73:2181,172.31.38.251:2181,
> 172.31.38.252:2181 sessionTimeout=20000
> watcher=org.apache.storm.curator.ConnectionState@5172aa5a
> 2015-06-25T19:42:32.649+0000 o.a.s.z.ClientCnxn [INFO] Opening socket
> connection to server 172.31.28.73/172.31.28.73:2181. Will not attempt to
> authenticate using SASL (unknown error)
> 2015-06-25T19:42:32.655+0000 o.a.s.z.ClientCnxn [INFO] Socket connection
> established to 172.31.28.73/172.31.28.73:2181, initiating session
> 2015-06-25T19:42:32.670+0000 o.a.s.z.ClientCnxn [INFO] Session
> establishment complete on server 172.31.28.73/172.31.28.73:2181, sessionid
> = 0x14e2b0caa01005f, negotiated timeout = 20000
> 2015-06-25T19:42:32.672+0000 o.a.s.c.f.s.ConnectionStateManager [INFO]
> State change: CONNECTED
> 2015-06-25T19:42:32.674+0000 b.s.zookeeper [INFO] Zookeeper state update:
> :connected:none
> 2015-06-25T19:42:32.703+0000 o.a.s.z.ClientCnxn [INFO] EventThread shut down
> 2015-06-25T19:42:32.703+0000 o.a.s.z.ZooKeeper [INFO] Session:
> 0x14e2b0caa01005f closed
> 2015-06-25T19:42:32.705+0000 b.s.u.StormBoundedExponentialBackoffRetry
> [INFO] The baseSleepTimeMs [1000] the maxSleepTimeMs [30000] the maxRetries
> [5]
> 2015-06-25T19:42:32.706+0000 o.a.s.c.f.i.CuratorFrameworkImpl [INFO]
> Starting
> 2015-06-25T19:42:32.716+0000 o.a.s.z.ZooKeeper [INFO] Initiating client
> connection, connectString=172.31.28.73:2181,172.31.38.251:2181,
> 172.31.38.252:2181/storm sessionTimeout=20000
> watcher=org.apache.storm.curator.ConnectionState@3f308697
> 2015-06-25T19:42:32.727+0000 o.a.s.z.ClientCnxn [INFO] Opening socket
> connection to server 172.31.28.73/172.31.28.73:2181. Will not attempt to
> authenticate using SASL (unknown error)
> 2015-06-25T19:42:32.727+0000 o.a.s.z.ClientCnxn [INFO] Socket connection
> established to 172.31.28.73/172.31.28.73:2181, initiating session
> 2015-06-25T19:42:32.733+0000 o.a.s.z.ClientCnxn [INFO] Session
> establishment complete on server 172.31.28.73/172.31.28.73:2181, sessionid
> = 0x14e2b0caa010061, negotiated timeout = 20000
> 2015-06-25T19:42:32.733+0000 o.a.s.c.f.s.ConnectionStateManager [INFO]
> State change: CONNECTED
> 2015-06-25T19:42:32.774+0000 b.s.d.worker [INFO] Reading Assignments.
> 2015-06-25T19:42:32.838+0000 b.s.m.TransportFactory [INFO] Storm peer
> transport plugin:backtype.storm.messaging.netty.Context
> 2015-06-25T19:42:32.971+0000 b.s.d.worker [INFO] Launching receive-thread
> for 58e551ba-f944-4aec-9c8f-5621053021dd:6703
> 2015-06-25T19:42:32.983+0000 b.s.m.n.Server [INFO] Create Netty Server
> Netty-server-localhost-6703, buffer_size: 10485760, maxWorkers: 4
> 2015-06-25T19:42:33.011+0000 b.s.m.loader [INFO] Starting receive-thread:
> [stormId: tpch-q5-top-6-1435261345, port: 6703, thread-id: 0 ]
> 2015-06-25T19:42:33.041+0000 b.s.m.n.Client [INFO] creating Netty Client,
> connecting to ip-172-31-19-254.us-west-2.compute.internal:6703, bufferSize:
> 10485760
> 2015-06-25T19:42:33.041+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN]
> maxRetries too large (100). Pinning to 29
> 2015-06-25T19:42:33.041+0000 b.s.u.StormBoundedExponentialBackoffRetry
> [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries
> [100]
> 2015-06-25T19:42:33.042+0000 b.s.m.n.Client [INFO] connection attempt 1 to
> Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6703
> scheduled to run in 0 ms
> 2015-06-25T19:42:33.067+0000 b.s.m.n.Client [ERROR] connection attempt 1 to
> Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6703
> failed: java.lang.RuntimeException: Returned channel was actually not
> established
> 2015-06-25T19:42:33.068+0000 b.s.m.n.Client [INFO] connection attempt 2 to
> Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6703
> scheduled to run in 103 ms
> 2015-06-25T19:42:33.071+0000 b.s.m.n.Client [INFO] creating Netty Client,
> connecting to ip-172-31-13-184.us-west-2.compute.internal:6703, bufferSize:
> 10485760
> 2015-06-25T19:42:33.071+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN]
> maxRetries too large (100). Pinning to 29
> 2015-06-25T19:42:33.071+0000 b.s.u.StormBoundedExponentialBackoffRetry
> [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries
> [100]
> 2015-06-25T19:42:33.076+0000 b.s.m.n.Client [INFO] connection attempt 1 to
> Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/172.31.13.184:6703
> scheduled to run in 0 ms
> 2015-06-25T19:42:33.080+0000 b.s.m.n.Client [INFO] creating Netty Client,
> connecting to ip-172-31-19-254.us-west-2.compute.internal:6702, bufferSize:
> 10485760
> 2015-06-25T19:42:33.080+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN]
> maxRetries too large (100). Pinning to 29
> 2015-06-25T19:42:33.080+0000 b.s.u.StormBoundedExponentialBackoffRetry
> [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries
> [100]
> 2015-06-25T19:42:33.080+0000 b.s.m.n.Client [INFO] connection attempt 1 to
> Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6702
> scheduled to run in 0 ms
> 2015-06-25T19:42:33.081+0000 b.s.m.n.Client [INFO] creating Netty Client,
> connecting to ip-172-31-13-184.us-west-2.compute.internal:6702, bufferSize:
> 10485760
> 2015-06-25T19:42:33.082+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN]
> maxRetries too large (100). Pinning to 29
> 2015-06-25T19:42:33.082+0000 b.s.u.StormBoundedExponentialBackoffRetry
> [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries
> [100]
> 2015-06-25T19:42:33.082+0000 b.s.m.n.Client [INFO] connection attempt 1 to
> Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/172.31.13.184:6702
> scheduled to run in 0 ms
> 2015-06-25T19:42:33.084+0000 b.s.m.n.Client [INFO] creating Netty Client,
> connecting to ip-172-31-19-254.us-west-2.compute.internal:6701, bufferSize:
> 10485760
> 2015-06-25T19:42:33.084+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN]
> maxRetries too large (100). Pinning to 29
> 2015-06-25T19:42:33.084+0000 b.s.u.StormBoundedExponentialBackoffRetry
> [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries
> [100]
> 2015-06-25T19:42:33.084+0000 b.s.m.n.Client [INFO] connection attempt 1 to
> Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6701
> scheduled to run in 0 ms
> 2015-06-25T19:42:33.162+0000 b.s.m.n.Client [ERROR] connection attempt 1 to
> Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/172.31.13.184:6703
> failed: java.lang.RuntimeException: Returned channel was actually not
> established
> 2015-06-25T19:42:33.162+0000 b.s.m.n.Client [INFO] creating Netty Client,
> connecting to ip-172-31-13-184.us-west-2.compute.internal:6701, bufferSize:
> 10485760
> 2015-06-25T19:42:33.162+0000 b.s.m.n.Client [INFO] connection attempt 2 to
> Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/172.31.13.184:6703
> scheduled to run in 103 ms
> 2015-06-25T19:42:33.163+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN]
> maxRetries too large (100). Pinning to 29
> and
> 2015-06-25T19:42:33.176+0000 b.s.u.StormBoundedExponentialBackoffRetry
> [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries
> [100]
> 2015-06-25T19:42:33.176+0000 b.s.m.n.Client [INFO] connection attempt 1 to
> Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/172.31.19.253:6700
> scheduled to run in 0 ms
> 2015-06-25T19:42:33.178+0000 b.s.m.n.Client [ERROR] connection attempt 1 to
> Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/172.31.13.184:6700
> failed: java.lang.RuntimeException: Returned channel was actually not
> established
> 2015-06-25T19:42:33.189+0000 b.s.m.n.Client [ERROR] connection attempt 2 to
> Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6703
> failed: java.lang.RuntimeException: Returned channel was actually not
> established
> 2015-06-25T19:42:33.190+0000 b.s.m.n.Client [INFO] connection attempt 2 to
> Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/172.31.13.184:6700
> scheduled to run in 103 ms
> 2015-06-25T19:42:33.191+0000 b.s.m.n.Client [INFO] connection attempt 3 to
> Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6703
> scheduled to run in 105 ms
> 2015-06-25T19:42:33.195+0000 b.s.m.n.Client [ERROR] connection attempt 1 to
> Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/172.31.19.253:6700
> failed: java.lang.RuntimeException: Returned channel was actually not
> established
> 2015-06-25T19:42:33.195+0000 b.s.m.n.Client [INFO] connection attempt 2 to
> Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/172.31.19.253:6700
> scheduled to run in 102 ms
> 2015-06-25T19:42:33.196+0000 b.s.m.n.Client [ERROR] connection attempt 1 to
> Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/172.31.19.252:6702
> failed: java.lang.RuntimeException: Returned channel was actually not
> established
> 2015-06-25T19:42:33.196+0000 b.s.m.n.Client [INFO] connection attempt 2 to
> Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/172.31.19.252:6702
> scheduled to run in 102 ms
> 2015-06-25T19:42:33.197+0000 b.s.m.n.Client [ERROR] connection attempt 1 to
> Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/172.31.19.252:6700
> failed: java.lang.RuntimeException: Returned channel was actually not
> established
> 2015-06-25T19:42:33.198+0000 b.s.m.n.Client [INFO] connection attempt 2 to
> Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/172.31.19.252:6700
> scheduled to run in 103 ms
> 2015-06-25T19:42:33.198+0000 b.s.m.n.Client [ERROR] connection attempt 1 to
> Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/172.31.19.252:6703
> failed: java.lang.RuntimeException: Returned channel was actually not
> established
> 2015-06-25T19:42:33.198+0000 b.s.m.n.Client [ERROR] connection attempt 1 to
> Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/172.31.19.253:6702
> failed: java.lang.RuntimeException: Returned channel was actually not
> established
> 2015-06-25T19:42:33.205+0000 b.s.m.n.Client [INFO] connection attempt 2 to
> Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/172.31.19.252:6703
> scheduled to run in 103 ms
> 2015-06-25T19:42:33.198+0000 b.s.m.n.Client [INFO] connection established
> to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/
> 172.31.19.252:6701
> 2015-06-25T19:42:33.206+0000 b.s.m.n.Client [INFO] connection attempt 2 to
> Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/172.31.19.253:6702
> scheduled to run in 102 ms
> 2015-06-25T19:42:33.205+0000 b.s.m.n.Client [INFO] connection established
> to Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/
> 172.31.19.253:6701
> 2015-06-25T19:42:33.268+0000 b.s.m.n.Client [ERROR] connection attempt 2 to
> Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/172.31.13.184:6703
> failed: java.lang.RuntimeException: Returned channel was actually not
> established
> 2015-06-25T19:42:33.272+0000 b.s.m.n.Client [INFO] connection attempt 3 to
> Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/172.31.13.184:6703
> scheduled to run in 105 ms
> 2015-06-25T19:42:33.273+0000 b.s.m.n.Client [ERROR] connection attempt 2 to
> Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6701
> failed: java.lang.RuntimeException: Returned channel was actually not
> established
> 2015-06-25T19:42:33.273+0000 b.s.m.n.Client [INFO] connection attempt 3 to
> Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6701
> scheduled to run in 105 ms
> 2015-06-25T19:42:33.274+0000 b.s.m.n.Client [ERROR] connection attempt 2 to
> Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6702
> failed: java.lang.RuntimeException: Returned channel was actually not
> established
> 2015-06-25T19:42:33.274+0000 b.s.m.n.Client [INFO] connection attempt 3 to
> Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/172.31.19.254:6702
> scheduled to run in 106 ms
> 2015-06-25T19:42:33.275+0000 b.s
> Why am I getting the above. Initially, I thought that the input rate of
> tuples in my topology is too high, and Netty's buffers are filled up too
> fast. However, I submitted a debug topology
> that sent one tuple every 1 second and I still got the above messages.
> Am I doing something wrong in my configuration? Why do I have the previous
> Netty messages, which obviously show that something is going wrong? Please,
> any hint on my setup will be really helpful.
> Regards,
> Nick

Reply via email to