Hi Benjamin,

         Correct me if I missed it  , in your config  I don't
see storm.local.dir defined. If its not defined in config storm
will create one in the storm_installation dir which seems to
be

/home/bsoulas/incubator-storm-master/storm-dist/binary/target/a
pache-storm-0.9.3-ben/apache-storm-0.9.3-ben/

and are you running the supervisor and nimbus as user
"bsoulas". When you are running "storm nimbus or storm
supervisor" command which storm command its pointing. Did you
export
STORM_HOME=/home/bsoulas/incubator-storm-master/storm-dist/bina
ry/target/apache-storm-0.9.3-ben" and also added it to PATH. I
am checking to see if you had any previous installation of
storm and invoking the storm command from previous
installation.

Can you also check zookeeper logs .

-Harsha



On Tue, Sep 2, 2014, at 03:39 AM, Benjamin SOULAS wrote:

Hi everyone,

I followed your instructions for installing a zookeeper server,
i downloaded it on the website, extract the tar file somewhere
in a machine on my cluster, i made those modifications in my
zoo.cfg :


# The number of milliseconds of each tick

tickTime=2000

# The number of ticks that the initial

# synchronization phase can take

initLimit=10

# The number of ticks that can pass between

# sending a request and getting an acknowledgement

syncLimit=5

# the directory where the snapshot is stored.

# do not use /tmp for storage, /tmp here is just

# example sakes.

dataDir=/home/bsoulas/zookeeper/zookeeper-3.4.6/data/

# the port at which the clients will connect

clientPort=2181

# the maximum number of client connections.

# increase this if you need to handle more clients

#maxClientCnxns=60

#

# Be sure to read the maintenance section of the

# administrator guide before turning on autopurge.

#

#
[1]http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#
sc_maintenance

#

# The number of snapshots to retain in dataDir

#autopurge.snapRetainCount=3

# Purge task interval in hours

# Set to "0" to disable auto purge feature

#autopurge.purgeInterval=1


In the log4j.properties, i uncommented the line for the log
file :

# Example with rolling log file

log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE


Then i went to my storm.yaml (located here in my case, because
i took the source version) :

/home/bsoulas/incubator-storm-master/storm-dist/binary/target/a
pache-storm-0.9.3-ben/apache-storm-0.9.3-ben/conf


This file contain this configuration :

########### These MUST be filled in for a storm configuration

 storm.zookeeper.servers:

     - "paradent-4"

#     - "paradent-47"

#     - "paradent-48"

#

 nimbus.host: "paradent-4"

#

#

# ##### These may optionally be filled in:

#

## List of custom serializations

# topology.kryo.register:

#     - org.mycompany.MyType

#     - org.mycompany.MyType2: org.mycompany.MyType2Serializer

#

## List of custom kryo decorators

# topology.kryo.decorators:

#     - org.mycompany.MyDecorator

#

## Locations of the drpc servers

# drpc.servers:

#     - "server1"

#     - "server2"

## Metrics Consumers

# topology.metrics.consumer.register:

#   - class: "backtype.storm.metric.LoggingMetricsConsumer"

#     parallelism.hint: 1

#   - class: "org.mycompany.MyMetricsConsumer"

#     parallelism.hint: 1

#     argument:

#       - endpoint: "[2]metrics-collector.mycompany.org"

 dev.zookeeper.path:
"paradent-4.rennes.grid5000.fr:~/home/bsoulas/zookeeper/zookeep
er-3.4.6/"

 storm.zookeeper.port: 2181

To launch storm on the cluster, i launch it thanks to storm
nimbus (on a machine named paradent-4), then my zookeeper
Server sh zkServer.sh start (on paradent-4 again)(which create
a zookeeper_server.pid where the pid of the zookeeper is
written, i know it's obvious ...>_< ).

After i launch my storm ui for having a visual of my storm app
(on paradent-4). Until now, everything work fine. Now, the
logical way implies i launch my supervisor, on a different
machine (here paradent-39) thanks to storm supervisor, it is
launched but once again, 3 or 4 seconds after it's down.

So i watched the supervisor.log located :

/home/bsoulas/incubator-storm-master/storm-dist/binary/target/a
pache-storm-0.9.3-ben/apache-storm-0.9.3-ben/logs


And here appear a tricky error :

2014-09-02 09:31:37 o.a.c.f.i.CuratorFrameworkImpl [INFO]
Starting

2014-09-02 09:31:37 o.a.z.ZooKeeper [INFO] Initiating client
connection, connectString=paradent-4:2181 sessionTimeout=20000
watcher=org.apache.curator.ConnectionState@220df4c8

2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Opening socket
connection to server
[3]paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not
attempt to authenticate using SASL (unknown error)

2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Socket connection
established to
[4]paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating
session

2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Session
establishment complete on server
[5]paradent-4.rennes.grid5000.fr/172.16.97.4:2181, sessionid =
0x14835a48ca90004, negotiated timeout = 20000

2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [INFO]
State change: CONNECTED

2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [WARN]
There are no ConnectionStateListeners registered.

2014-09-02 09:31:37 b.s.zookeeper [INFO] Zookeeper state
update: :connected:none

2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Session:
0x14835a48ca90004 closed

2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] EventThread shut
down

2014-09-02 09:31:38 o.a.c.f.i.CuratorFrameworkImpl [INFO]
Starting

2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Initiating client
connection, connectString=paradent-4:2181/storm
sessionTimeout=20000
watcher=org.apache.curator.ConnectionState@c6d625b

2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Opening socket
connection to server
[6]paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not
attempt to authenticate using SASL (unknown error)

2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Socket connection
established to
[7]paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating
session

2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Session
establishment complete on server
[8]paradent-4.rennes.grid5000.fr/172.16.97.4:2181, sessionid =
0x14835a48ca90005, negotiated timeout = 20000

2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [INFO]
State change: CONNECTED

2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [WARN]
There are no ConnectionStateListeners registered.

2014-09-02 09:31:38 b.s.d.supervisor [INFO] Starting supervisor
with id 280caffa-d6c5-4fd4-8282-7d8c1dec7e66 at host
[9]paradent-39.rennes.grid5000.fr

2014-09-02 09:31:39 b.s.event [ERROR] Error when processing
event

java.io.FileNotFoundException: File
'/home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73
cbaf7c5c/heartbeats/1409146760275' does not exist

at
org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:
299) ~[commons-io-2.4.jar:2.4]

at
org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.j
ava:1763) ~[commons-io-2.4.jar:2.4]

at backtype.storm.utils.LocalState.snapshot(LocalState.java:45)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at backtype.storm.utils.LocalState.get(LocalState.java:56)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at
backtype.storm.daemon.supervisor$read_worker_heartbeat.invoke(s
upervisor.clj:77) ~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at
backtype.storm.daemon.supervisor$read_worker_heartbeats$iter__6
381__6385$fn__6386.invoke(supervisor.clj:90)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at clojure.lang.LazySeq.sval(LazySeq.java:42)
~[clojure-1.5.1.jar:na]

at clojure.lang.LazySeq.seq(LazySeq.java:60)
~[clojure-1.5.1.jar:na]

at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.5.1.jar:na]

at clojure.lang.LazySeq.next(LazySeq.java:92)
~[clojure-1.5.1.jar:na]

at clojure.lang.RT.next(RT.java:598) ~[clojure-1.5.1.jar:na]

at clojure.core$next.invoke(core.clj:64)
~[clojure-1.5.1.jar:na]

at clojure.core$dorun.invoke(core.clj:2781)
~[clojure-1.5.1.jar:na]

at clojure.core$doall.invoke(core.clj:2796)
~[clojure-1.5.1.jar:na]

at
backtype.storm.daemon.supervisor$read_worker_heartbeats.invoke(
supervisor.clj:89) ~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at
backtype.storm.daemon.supervisor$read_allocated_workers.invoke(
supervisor.clj:106) ~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at
backtype.storm.daemon.supervisor$sync_processes.invoke(supervis
or.clj:209) ~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at clojure.lang.AFn.applyToHelper(AFn.java:161)
[clojure-1.5.1.jar:na]

at clojure.lang.AFn.applyTo(AFn.java:151)
[clojure-1.5.1.jar:na]

at clojure.core$apply.invoke(core.clj:619)
~[clojure-1.5.1.jar:na]

at clojure.core$partial$fn__4190.doInvoke(core.clj:2396)
~[clojure-1.5.1.jar:na]

at clojure.lang.RestFn.invoke(RestFn.java:397)
~[clojure-1.5.1.jar:na]

at
backtype.storm.event$event_manager$fn__4687.invoke(event.clj:39
) ~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]

at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]

2014-09-02 09:31:39 b.s.util [INFO] Halting process: ("Error
when processing an event")


I understood that there was a missing file, my question is
"why?????". If i watch the rights with ls -l at this path :

/home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73c
baf7c5c/

I have this :

drwxr-xr-x 2 bsoulas users 4096 Aug 27 15:39 heartbeats

So for me this is not the problem, can someone help me? I am
really stuck here :S

I sincerely hope to be clear and precise enough ...

Kind regards.






2014-08-29 16:47 GMT+02:00 Harsha <[10]st...@harsha.io>:


Hi Benjamin,
            Storm cluster needs a zookeeper quorum to function.
ExclamationTopology accepts command line params to deploy on a
storm cluster. If you don't pass any arguments it will use
LocalCluster(a simulated local cluster) to deploy.
I recommend you to go through
[11]http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
for setting up zookeeper. Here is an excellent write up on
storm cluster setup along with
zookeeper [12]http://www.michael-noll.com/tutorials/running-mul
ti-node-storm-cluster/.
Hope that helps.
-Harsha

On Fri, Aug 29, 2014, at 05:34 AM, Benjamin SOULAS wrote:

Hello everyone, i have a problem during implementing storm on a
cluster (Grid 5000 if anyone knows). I took the
inubator-storm-master from the github branch with the sources,
i succeeded to create my own release (no code modification,
just for maven errors that were disturbing...)

It's working fine on my own laptop in local, i modified the
ExclamationTopology in adding 40 more bolts. I also modified
this Topology to allow 50 workers in the configuration.

Now on a cluster, when I try to do the same thing, supervisors
are down just 3s after their execution. Nimbus is ok,
dev-zookeeeper too, storm ui too.

I read somewhere on the apache website you need to implement a
real zookeeper (not the one in storm).

Please, does someone knows a good tutorial explaining how
running a zookeeper server on a cluster for storm?

I hope I am clear ...

Kind regards.

Benjamin SOULAS

References

Visible links
1. http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
2. http://metrics-collector.mycompany.org/
3. http://paradent-4.rennes.grid5000.fr/172.16.97.4:2181
4. http://paradent-4.rennes.grid5000.fr/172.16.97.4:2181
5. http://paradent-4.rennes.grid5000.fr/172.16.97.4:2181
6. http://paradent-4.rennes.grid5000.fr/172.16.97.4:2181
7. http://paradent-4.rennes.grid5000.fr/172.16.97.4:2181
8. http://paradent-4.rennes.grid5000.fr/172.16.97.4:2181
9. http://paradent-39.rennes.grid5000.fr/
  10. mailto:st...@harsha.io
  11. http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
  12. http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/

Hidden links:
  14. http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html

Reply via email to