Hi everyone,

I found the how to fix it In fact, on my cluster, we use the NFS technology
and I didn't change my storm.yaml where I had to define *storm.local.dir*.
Indeed, if you don't define it, storm will take it in the file
defaults.yaml where it defines this : *storm.local.dir: "storm-local"*.

The problem here, where you have NFS on your cluster is it create the
storm-local directory in your home. But in NFS as you should probably know,
every nodes have access to the same home.

So when storm creates is directory "storm-local", inside of it, he create a
directory named "supervisor". But every time you launch a Supervisor, it
erases the last one created so the last supervisor will be stopped.

So the only solution is to modify your storm.yaml and insert the new
storm.local.dir. In my case, because I am on an NFS cluster, I had to put
this path : *storm.local.dir: "/tmp/storm-local"* because it is not in your
home directory so it won't be shared between your nodes.

So every nodes launching a Supervisor will have its own Supervisor
directory.

I sincerely hope to be clear and that could help someone.

Benjamin.

2014-09-02 19:53 GMT+02:00 Benjamin SOULAS <benjamin.soula...@gmail.com>:

> Hi Harsha,
>
> You're right, I didn't export STORM_HOME ...
>
> I will do it, maybe this is the problem.
>
> Thanks
>
>
> 2014-09-02 18:08 GMT+02:00 Harsha <st...@harsha.io>:
>
>>  Hi Benjamin,
>>          Correct me if I missed it  , in your config  I don't see
>> storm.local.dir defined. If its not defined in config storm will create one
>> in the storm_installation dir which seems to be
>>
>> /home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/
>> and are you running the supervisor and nimbus as user "bsoulas". When you
>> are running "storm nimbus or storm supervisor" command which storm command
>> its pointing. Did you export
>> STORM_HOME=/home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben"
>> and also added it to PATH. I am checking to see if you had any previous
>> installation of storm and invoking the storm command from previous
>> installation.
>>  Can you also check zookeeper logs .
>> -Harsha
>>
>> On Tue, Sep 2, 2014, at 03:39 AM, Benjamin SOULAS wrote:
>>
>> Hi everyone,
>>
>> I followed your instructions for installing a zookeeper server, i
>> downloaded it on the website, extract the tar file somewhere in a machine
>> on my cluster, i made those modifications in my zoo.cfg :
>>
>>
>>
>> # The number of milliseconds of each tick
>>
>> tickTime=2000
>>
>> # The number of ticks that the initial
>>
>> # synchronization phase can take
>>
>> initLimit=10
>>
>> # The number of ticks that can pass between
>>
>> # sending a request and getting an acknowledgement
>>
>> syncLimit=5
>>
>> # the directory where the snapshot is stored.
>>
>> # do not use /tmp for storage, /tmp here is just
>>
>> # example sakes.
>>
>> dataDir=/home/bsoulas/zookeeper/zookeeper-3.4.6/data/
>>
>> # the port at which the clients will connect
>>
>> clientPort=2181
>>
>> # the maximum number of client connections.
>>
>> # increase this if you need to handle more clients
>>
>> #maxClientCnxns=60
>>
>> #
>>
>> # Be sure to read the maintenance section of the
>>
>> # administrator guide before turning on autopurge.
>>
>> #
>>
>> #
>> http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
>>
>> #
>>
>> # The number of snapshots to retain in dataDir
>>
>> #autopurge.snapRetainCount=3
>>
>> # Purge task interval in hours
>>
>> # Set to "0" to disable auto purge feature
>>
>> #autopurge.purgeInterval=1
>>
>>
>> In the log4j.properties, i uncommented the line for the log file :
>>
>>
>> # Example with rolling log file
>>
>> log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE
>>
>>
>> Then i went to my storm.yaml (located here in my case, because i took the
>> source version) :
>>
>>
>>
>> /home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/conf
>>
>>
>> This file contain this configuration :
>>
>>
>> ########### These MUST be filled in for a storm configuration
>>
>>  storm.zookeeper.servers:
>>
>>      - "paradent-4"
>>
>> #     - "paradent-47"
>>
>> #     - "paradent-48"
>>
>>
>> #
>>
>>  nimbus.host: "paradent-4"
>>
>> #
>>
>> #
>>
>> # ##### These may optionally be filled in:
>>
>> #
>>
>> ## List of custom serializations
>>
>> # topology.kryo.register:
>>
>> #     - org.mycompany.MyType
>>
>> #     - org.mycompany.MyType2: org.mycompany.MyType2Serializer
>>
>> #
>>
>> ## List of custom kryo decorators
>>
>> # topology.kryo.decorators:
>>
>> #     - org.mycompany.MyDecorator
>>
>> #
>>
>> ## Locations of the drpc servers
>>
>> # drpc.servers:
>>
>> #     - "server1"
>>
>> #     - "server2"
>>
>>
>> ## Metrics Consumers
>>
>> # topology.metrics.consumer.register:
>>
>> #   - class: "backtype.storm.metric.LoggingMetricsConsumer"
>>
>> #     parallelism.hint: 1
>>
>> #   - class: "org.mycompany.MyMetricsConsumer"
>>
>> #     parallelism.hint: 1
>>
>> #     argument:
>>
>> #       - endpoint: "metrics-collector.mycompany.org"
>>
>>  dev.zookeeper.path: "paradent-4.rennes.grid5000.fr:
>> ~/home/bsoulas/zookeeper/zookeeper-3.4.6/"
>>
>>  storm.zookeeper.port: 2181
>>
>> To launch storm on the cluster, i launch it thanks to *storm nimbus *(on
>> a machine named paradent-4), then my zookeeper Server *sh zkServer.sh
>> start* (on paradent-4 again)(which create a *zookeeper_server.pid* where
>> the pid of the zookeeper is written, i know it's obvious ...>_< ).
>>
>> After i launch my *storm ui* for having a visual of my storm app (on
>> paradent-4). Until now, everything work fine. Now, the logical way implies
>> i launch my supervisor, on a different machine (here *paradent-39*)
>> thanks to *storm supervisor*, it is launched but once again, 3 or 4
>> seconds after it's down.
>>
>> So i watched the supervisor.log located :
>>
>>
>>
>> /home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/logs
>>
>>
>> And here appear a tricky error :
>>
>>
>> 2014-09-02 09:31:37 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>
>> 2014-09-02 09:31:37 o.a.z.ZooKeeper [INFO] Initiating client connection,
>> connectString=paradent-4:2181 sessionTimeout=20000
>> watcher=org.apache.curator.ConnectionState@220df4c8
>>
>> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Opening socket connection to
>> server paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not attempt
>> to authenticate using SASL (unknown error)
>>
>> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Socket connection established
>> to paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating session
>>
>> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Session establishment
>> complete on server paradent-4.rennes.grid5000.fr/172.16.97.4:2181,
>> sessionid = 0x14835a48ca90004, negotiated timeout = 20000
>>
>> 2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [INFO] State change:
>> CONNECTED
>>
>> 2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [WARN] There are no
>> ConnectionStateListeners registered.
>>
>> 2014-09-02 09:31:37 b.s.zookeeper [INFO] Zookeeper state update:
>> :connected:none
>>
>> 2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Session: 0x14835a48ca90004
>> closed
>>
>> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] EventThread shut down
>>
>> 2014-09-02 09:31:38 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>
>> 2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Initiating client connection,
>> connectString=paradent-4:2181/storm sessionTimeout=20000
>> watcher=org.apache.curator.ConnectionState@c6d625b
>>
>> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Opening socket connection to
>> server paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not attempt
>> to authenticate using SASL (unknown error)
>>
>> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Socket connection established
>> to paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating session
>>
>> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Session establishment
>> complete on server paradent-4.rennes.grid5000.fr/172.16.97.4:2181,
>> sessionid = 0x14835a48ca90005, negotiated timeout = 20000
>>
>> 2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [INFO] State change:
>> CONNECTED
>>
>> 2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [WARN] There are no
>> ConnectionStateListeners registered.
>>
>> 2014-09-02 09:31:38 b.s.d.supervisor [INFO] Starting supervisor with id
>> 280caffa-d6c5-4fd4-8282-7d8c1dec7e66 at host
>> paradent-39.rennes.grid5000.fr
>>
>> 2014-09-02 09:31:39 b.s.event [ERROR] Error when processing event
>>
>> java.io.FileNotFoundException: File
>> '/home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73cbaf7c5c/heartbeats/1409146760275'
>> does not exist
>>
>> at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:299)
>> ~[commons-io-2.4.jar:2.4]
>>
>> at
>> org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1763)
>> ~[commons-io-2.4.jar:2.4]
>>
>> at backtype.storm.utils.LocalState.snapshot(LocalState.java:45)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at backtype.storm.utils.LocalState.get(LocalState.java:56)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at
>> backtype.storm.daemon.supervisor$read_worker_heartbeat.invoke(supervisor.clj:77)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at
>> backtype.storm.daemon.supervisor$read_worker_heartbeats$iter__6381__6385$fn__6386.invoke(supervisor.clj:90)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.LazySeq.seq(LazySeq.java:60) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.LazySeq.next(LazySeq.java:92) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.RT.next(RT.java:598) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.core$next.invoke(core.clj:64) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.core$dorun.invoke(core.clj:2781) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.core$doall.invoke(core.clj:2796) ~[clojure-1.5.1.jar:na]
>>
>> at
>> backtype.storm.daemon.supervisor$read_worker_heartbeats.invoke(supervisor.clj:89)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at
>> backtype.storm.daemon.supervisor$read_allocated_workers.invoke(supervisor.clj:106)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at
>> backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:209)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]
>>
>> at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
>>
>> at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.core$partial$fn__4190.doInvoke(core.clj:2396)
>> ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]
>>
>> at backtype.storm.event$event_manager$fn__4687.invoke(event.clj:39)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>>
>> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
>>
>> 2014-09-02 09:31:39 b.s.util [INFO] Halting process: ("Error when
>> processing an event")
>>
>>
>> I understood that there was a missing file, my question is "why?????". If
>> i watch the rights with ls -l at this path :
>>
>>
>> /home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73cbaf7c5c/
>>
>> I have this :
>>
>>
>> drwxr-xr-x 2 bsoulas users 4096 Aug 27 15:39 heartbeats
>>
>> So for me this is not the problem, can someone help me? I am really stuck
>> here :S
>>
>> I sincerely hope to be clear and precise enough ...
>>
>> Kind regards.
>>
>>
>>
>>
>>
>>
>> 2014-08-29 16:47 GMT+02:00 Harsha <st...@harsha.io>:
>>
>>
>>
>> Hi Benjamin,
>>             Storm cluster needs a zookeeper quorum to function.
>> ExclamationTopology accepts command line params to deploy on a storm
>> cluster. If you don't pass any arguments it will use LocalCluster(a
>> simulated local cluster) to deploy.
>>  I recommend you to go through
>> http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
>> <http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html>
>>  for setting up zookeeper. Here is an excellent write up on storm
>> cluster setup along with zookeeper
>> http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/.
>>  Hope that helps.
>> -Harsha
>>
>>
>> On Fri, Aug 29, 2014, at 05:34 AM, Benjamin SOULAS wrote:
>>
>> Hello everyone, i have a problem during implementing storm on a cluster
>> (Grid 5000 if anyone knows). I took the inubator-storm-master from the
>> github branch with the sources, i succeeded to create my own release (no
>> code modification, just for maven errors that were disturbing...)
>>
>> It's working fine on my own laptop in local, i modified the
>> ExclamationTopology in adding 40 more bolts. I also modified this Topology
>> to allow 50 workers in the configuration.
>>
>>  Now on a cluster, when I try to do the same thing, supervisors are down
>> just 3s after their execution. Nimbus is ok, dev-zookeeeper too, storm ui
>> too.
>>
>>  I read somewhere on the apache website you need to implement a real
>> zookeeper (not the one in storm).
>>
>>  Please, does someone knows a good tutorial explaining how running a
>> zookeeper server on a cluster for storm?
>>
>>  I hope I am clear ...
>>
>>  Kind regards.
>>
>>  Benjamin SOULAS
>>
>>
>>
>>
>>
>>
>>
>
>

Reply via email to