Hi Benjamin,

         Correct me if I missed it  , in your config  I don't
see storm.local.dir defined. If its not defined in config storm
will create one in the storm_installation dir which seems to


and are you running the supervisor and nimbus as user
"bsoulas". When you are running "storm nimbus or storm
supervisor" command which storm command its pointing. Did you
ry/target/apache-storm-0.9.3-ben" and also added it to PATH. I
am checking to see if you had any previous installation of
storm and invoking the storm command from previous

Can you also check zookeeper logs .


On Tue, Sep 2, 2014, at 03:39 AM, Benjamin SOULAS wrote:

Hi everyone,

I followed your instructions for installing a zookeeper server,
i downloaded it on the website, extract the tar file somewhere
in a machine on my cluster, i made those modifications in my
zoo.cfg :

# The number of milliseconds of each tick


# The number of ticks that the initial

# synchronization phase can take


# The number of ticks that can pass between

# sending a request and getting an acknowledgement


# the directory where the snapshot is stored.

# do not use /tmp for storage, /tmp here is just

# example sakes.


# the port at which the clients will connect


# the maximum number of client connections.

# increase this if you need to handle more clients



# Be sure to read the maintenance section of the

# administrator guide before turning on autopurge.




# The number of snapshots to retain in dataDir


# Purge task interval in hours

# Set to "0" to disable auto purge feature


In the log4j.properties, i uncommented the line for the log
file :

# Example with rolling log file


Then i went to my storm.yaml (located here in my case, because
i took the source version) :


This file contain this configuration :

########### These MUST be filled in for a storm configuration


     - "paradent-4"

#     - "paradent-47"

#     - "paradent-48"


 nimbus.host: "paradent-4"



# ##### These may optionally be filled in:


## List of custom serializations

# topology.kryo.register:

#     - org.mycompany.MyType

#     - org.mycompany.MyType2: org.mycompany.MyType2Serializer


## List of custom kryo decorators

# topology.kryo.decorators:

#     - org.mycompany.MyDecorator


## Locations of the drpc servers

# drpc.servers:

#     - "server1"

#     - "server2"

## Metrics Consumers

# topology.metrics.consumer.register:

#   - class: "backtype.storm.metric.LoggingMetricsConsumer"

#     parallelism.hint: 1

#   - class: "org.mycompany.MyMetricsConsumer"

#     parallelism.hint: 1

#     argument:

#       - endpoint: "[2]metrics-collector.mycompany.org"


 storm.zookeeper.port: 2181

To launch storm on the cluster, i launch it thanks to storm
nimbus (on a machine named paradent-4), then my zookeeper
Server sh zkServer.sh start (on paradent-4 again)(which create
a zookeeper_server.pid where the pid of the zookeeper is
written, i know it's obvious ...>_< ).

After i launch my storm ui for having a visual of my storm app
(on paradent-4). Until now, everything work fine. Now, the
logical way implies i launch my supervisor, on a different
machine (here paradent-39) thanks to storm supervisor, it is
launched but once again, 3 or 4 seconds after it's down.

So i watched the supervisor.log located :


And here appear a tricky error :

2014-09-02 09:31:37 o.a.c.f.i.CuratorFrameworkImpl [INFO]

2014-09-02 09:31:37 o.a.z.ZooKeeper [INFO] Initiating client
connection, connectString=paradent-4:2181 sessionTimeout=20000

2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Opening socket
connection to server
[3]paradent-4.rennes.grid5000.fr/ Will not
attempt to authenticate using SASL (unknown error)

2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Socket connection
established to
[4]paradent-4.rennes.grid5000.fr/, initiating

2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Session
establishment complete on server
[5]paradent-4.rennes.grid5000.fr/, sessionid =
0x14835a48ca90004, negotiated timeout = 20000

2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [INFO]
State change: CONNECTED

2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [WARN]
There are no ConnectionStateListeners registered.

2014-09-02 09:31:37 b.s.zookeeper [INFO] Zookeeper state
update: :connected:none

2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Session:
0x14835a48ca90004 closed

2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] EventThread shut

2014-09-02 09:31:38 o.a.c.f.i.CuratorFrameworkImpl [INFO]

2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Initiating client
connection, connectString=paradent-4:2181/storm

2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Opening socket
connection to server
[6]paradent-4.rennes.grid5000.fr/ Will not
attempt to authenticate using SASL (unknown error)

2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Socket connection
established to
[7]paradent-4.rennes.grid5000.fr/, initiating

2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Session
establishment complete on server
[8]paradent-4.rennes.grid5000.fr/, sessionid =
0x14835a48ca90005, negotiated timeout = 20000

2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [INFO]
State change: CONNECTED

2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [WARN]
There are no ConnectionStateListeners registered.

2014-09-02 09:31:38 b.s.d.supervisor [INFO] Starting supervisor
with id 280caffa-d6c5-4fd4-8282-7d8c1dec7e66 at host

2014-09-02 09:31:39 b.s.event [ERROR] Error when processing

java.io.FileNotFoundException: File
cbaf7c5c/heartbeats/1409146760275' does not exist

299) ~[commons-io-2.4.jar:2.4]

ava:1763) ~[commons-io-2.4.jar:2.4]

at backtype.storm.utils.LocalState.snapshot(LocalState.java:45)

at backtype.storm.utils.LocalState.get(LocalState.java:56)

upervisor.clj:77) ~[storm-core-0.9.3-ben.jar:0.9.3-ben]


at clojure.lang.LazySeq.sval(LazySeq.java:42)

at clojure.lang.LazySeq.seq(LazySeq.java:60)

at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.5.1.jar:na]

at clojure.lang.LazySeq.next(LazySeq.java:92)

at clojure.lang.RT.next(RT.java:598) ~[clojure-1.5.1.jar:na]

at clojure.core$next.invoke(core.clj:64)

at clojure.core$dorun.invoke(core.clj:2781)

at clojure.core$doall.invoke(core.clj:2796)

supervisor.clj:89) ~[storm-core-0.9.3-ben.jar:0.9.3-ben]

supervisor.clj:106) ~[storm-core-0.9.3-ben.jar:0.9.3-ben]

or.clj:209) ~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at clojure.lang.AFn.applyToHelper(AFn.java:161)

at clojure.lang.AFn.applyTo(AFn.java:151)

at clojure.core$apply.invoke(core.clj:619)

at clojure.core$partial$fn__4190.doInvoke(core.clj:2396)

at clojure.lang.RestFn.invoke(RestFn.java:397)

) ~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]

at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]

2014-09-02 09:31:39 b.s.util [INFO] Halting process: ("Error
when processing an event")

I understood that there was a missing file, my question is
"why?????". If i watch the rights with ls -l at this path :


I have this :

drwxr-xr-x 2 bsoulas users 4096 Aug 27 15:39 heartbeats

So for me this is not the problem, can someone help me? I am
really stuck here :S

I sincerely hope to be clear and precise enough ...

Kind regards.

2014-08-29 16:47 GMT+02:00 Harsha <[10]st...@harsha.io>:

Hi Benjamin,
            Storm cluster needs a zookeeper quorum to function.
ExclamationTopology accepts command line params to deploy on a
storm cluster. If you don't pass any arguments it will use
LocalCluster(a simulated local cluster) to deploy.
I recommend you to go through
for setting up zookeeper. Here is an excellent write up on
storm cluster setup along with
zookeeper [12]http://www.michael-noll.com/tutorials/running-mul
Hope that helps.

On Fri, Aug 29, 2014, at 05:34 AM, Benjamin SOULAS wrote:

Hello everyone, i have a problem during implementing storm on a
cluster (Grid 5000 if anyone knows). I took the
inubator-storm-master from the github branch with the sources,
i succeeded to create my own release (no code modification,
just for maven errors that were disturbing...)

It's working fine on my own laptop in local, i modified the
ExclamationTopology in adding 40 more bolts. I also modified
this Topology to allow 50 workers in the configuration.

Now on a cluster, when I try to do the same thing, supervisors
are down just 3s after their execution. Nimbus is ok,
dev-zookeeeper too, storm ui too.

I read somewhere on the apache website you need to implement a
real zookeeper (not the one in storm).

Please, does someone knows a good tutorial explaining how
running a zookeeper server on a cluster for storm?

I hope I am clear ...

Kind regards.

Benjamin SOULAS


