Gunderson
*From:* Nathan Leung [mailto:ncle...@gmail.com]
*Sent:* Wednesday, September 24, 2014 8:09 AM
*To:* user
*Subject:* Re: Configuration changes and storm cluster
In my experience storm.yaml changes require daemon restarts, while
cluster.xml changes get picked up dynamically by running
In my experience storm.yaml changes require daemon restarts, while
cluster.xml changes get picked up dynamically by running topologies.
On Wed, Sep 24, 2014 at 2:44 AM, Richards Peter hbkricha...@gmail.com
wrote:
Answers inline.
Regards,
Richards Peter.
On Tue, Sep 23, 2014 at 8:55 PM,
What does the code for your kryo serializer look like? Are you sure that
it is not returning null? Kryo will only be used to serialize if your
tuple is crossing worker boundaries; when you have 1 worker everything is
more or less passed by reference (through some queues and whatnot, but it
does
Yes, it works only with anchored tuples. If the tuple is un anchored there
is no way for the spout to know when it's been fully processed.
On Sep 10, 2014 4:11 AM, Spico Florin spicoflo...@gmail.com wrote:
Hello!
I would like to know if the set up for TOPOLOGY_MAX_SPOUT_PENDING will
be
The exception you're seeing is an issue that was fixed in the latest storm
release (0.9.2-incubating): https://issues.apache.org/jira/browse/STORM-187
If you are required to use an older version, one thing you can do is
replace netty with 0mq as discussed here:
You can also print GC details to log (the following example is verbose but
you can tailor it to your needs):
-Xloggc:/opt/storm/logs/gc-worker-%ID%.log -verbose:gc
-XX:GCLogFileSize=10m -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
-XX:+PrintGCDetails -XX:+PrintHeapAtGC
Hi,
Does anyone know what might cause the execute latency of a bolt to be much
higher than the process latency? I've heard that code after the ack()
method is called can cause this, but ack() is literally the last part of my
execute() method. Also, sometimes this happens only on some instances
Most things your spout or bolt uses, especially anything using a network
connection, open file, etc, should be created in the prepare() method, and
not on construction.
On Thu, Aug 7, 2014 at 10:43 AM, Spico Florin spicoflo...@gmail.com wrote:
Hello!
I have a bolt that is using a third
You would need to design a custom scheduler:
http://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/
On Wed, Aug 6, 2014 at 5:08 AM, Spico Florin spicoflo...@gmail.com wrote:
Hello!
I have a use case where I need that two bolts should be colocated
either on
the reference to the plugable
scheduler is pinting out to a github page that doesn't exist. Do you know
if there is an updated documentation about this subject?
Thanks in advance.
Florin
On Wed, Aug 6, 2014 at 3:16 PM, Nathan Leung ncle...@gmail.com wrote:
You would need to design a custom
Are you emitting in a separate thread? If so, yes. If not, no.
On Jul 31, 2014 7:45 AM, 唐思成 jadetan...@qq.com wrote:
Recently, I came across a question. Suppose I have a bolt withing a
ArrayList as its private field, every time this bolt receives a tuple, it
store the incoming tuple into this
Complete latency can be higher if you have a lot of data pending from the
spout (your topology.max.spout.pending is not set), however given the low
number of tuples I wouldn't suspect this to be the case. What is the
layout of your topology? Is it circular in any way?
On Wed, Jul 30, 2014 at
The topology.max.spout.pending configures how many messages can be un-acked
from each spout before it stops sending messages. So in your example, each
spout task can have 10 thousand messages waiting to be acked before it
throttle itself and stops emitting. Of course if some of those messages
In your example, five spouts would get data and the other five would not.
On Jul 23, 2014 5:11 PM, Kashyap Mhaisekar kashya...@gmail.com wrote:
Hi,
Is the no. of executors for KafkaSpout dependent on the partitions for the
topic?
For E.g.,
Say kafka TopicA has 5 partitions.
If I have a
Yes requesting 55 workers when you only have 14 is not going to work so
well. Why do you set number of workers so high for your topology?
On Jul 17, 2014 8:14 AM, chenlax lax...@hotmail.com wrote:
@Nathan Leung,i think i know why isolation scheduler unwork in my demo.
in the topology i set
As Vladi says, you should initialize the connection in prepare(), not in
the constructor. Any objects that are created in the constructor are
serialized and sent to the nimbus, so you will get this exception if you
create anything that is not serializable in your spout/bolt constructors.
On
Your yaml doesn't appear to be properly formatted. Either that, or your
email client stripped some characters.
On Thu, Jul 10, 2014 at 4:16 AM, chenlax lax...@hotmail.com wrote:
and submit other topology also can't get works.
Thanks,
Lax
--
From:
As Jungtaek said, the issue is 0mq memory usage. 0mq library code is
accessed through jni so its memory usage is not governed by the jvm. By
default there is no high water mark so your memory usage can explode. If
you aren't using reliable message handling (emitting tuples with an id) and
setting
Does the isolation scheduler suit your needs?
https://storm.incubator.apache.org/2013/01/11/storm082-released.html
On Jul 8, 2014 8:53 AM, chenlax lax...@hotmail.com wrote:
i means assign 2 supervisor machines execute a topology,the worker number
more than 2.maybe 10 or more.
Thanks,
Lax
Your data is replicated because of all grouping. I assume you have more
than kafka and hdfs bolt. If you send via fields grouping (or shuffle
grouping, or local or shuffle grouping), and both are subscribed to your
deserializer, then one task in hdfs bolt and one task in kafka bolt will
get the
Are you using all grouping for the deserialization bolt too? My original
point is that you should not be using this grouping anywhere in your
topology. It will cause the duplication you are seeing unless your
parallelism is set to one for your bolts. See the section on groupings
here
Each task has its own instance of the bolt class.
On Tue, Jul 1, 2014 at 7:31 PM, Pasquini, Reuben reuben.pasqu...@hp.com
wrote:
Hi,
Can the execute() method on a single instance of a Bolt execute
concurrently on multiple threads ? Or does each task get its own instance
of a particular
If you kill a topology in the ui, you will notice that sometimes it takes
awhile for it to clear and go away. If you try to reload the topology
during this time you will get the same exception. You should loop checking
the nimbus for this topology after you kill it, and only reload after you
A rough overview since I don't know if I can share code
1) create a thrift connection
2) get a Nimbus.Client object (I will call this 'client')
3) call client.getTopology(topology id) - returns StormTopology, I will
call this 'topology'
4) Iterate MapString, SpoutSpec that is returned by
1) a worker can spawn any number of threads, so you can possibly run into
standard shared resources issues (CPU, network, disk, etc). RAM is not as
big of a problem since each worker gets a fixed amount.
2) a worker is spawned for a particular topology; it only execute
spout/bolt tasks for the
If you are GCing too much and failing a lot of tuples (which may be in part
due to GCs) it is quite possible that you are out of RAM and you should
increase the amount that is allocated for each worker.
On Sat, May 31, 2014 at 9:25 PM, Srinath C srinat...@gmail.com wrote:
Hi Shaikh,
You
to a specific bolt
(Tweet/retweet/replytweet). Here it will insert data into HBase and forward
the same tuple to user bolt for further processing.
4. User bolt will insert data into HBase database. and so on.
Thanks Regards,
Riyaz
On Thu, May 29, 2014 at 3:57 PM, Nathan Leung ncle
Are you sure the bolt ran properly? If so, on which machine are you
looking for the file?
On Wed, May 28, 2014 at 4:30 PM, Dilpreet Singh dilpreet...@gmail.comwrote:
Hi,
I'm writing the output of a python bolt to a file by:
f = open('/tmp/clusters.txt', 'a')
Are you sure that you're not passing a null value in your tuple?
On May 21, 2014 7:25 AM, Irek Khasyanov qua...@gmail.com wrote:
Hello.
I have strange problem with by topology, sometimes everything crashed with
exception:
java.lang.RuntimeException: java.lang.NullPointerException
at
You can also synchronize access to the OutputCollector so that only 1
thread is using it at a time.
On Wed, May 21, 2014 at 12:39 PM, Irek Khasyanov qua...@gmail.com wrote:
Hm, yes, firs bolt emitting from different thread, I did't realize that
this will be problem. Thanks! I'll try to change
You can do something like this: -Xloggc:Your Storm Install
Dir/logs/gc-worker-%ID%.log
On Wed, May 14, 2014 at 2:01 PM, Sean Allen s...@monkeysnatchbanana.comwrote:
is anyone logging gc events for workers in their cluster?
outside of the storm, the following jvm options are pretty standard
Hi Jing,
Was message.max.bytes changed in your Kafka server config to be higher than
the default value (100 bytes)?
-Nathan
On Mon, May 19, 2014 at 5:54 PM, Tao, Jing j...@webmd.net wrote:
I finally found the root cause. Turns out the spout was reading a
message exceeded the max
value of the throughput measuring bolt in Storm UI is at around ~0.12.
Will try out more configurations and see. Thank you very much for your
tips.
Any other tweaks I might try out?
Thanks,
Lasantha
On Tue, May 13, 2014 at 6:38 PM, Nathan Leung ncle...@gmail.com wrote:
For 20 spouts
Hi,
I am configuring spout sleep in my topology by adding the following items
to my configuration map but it does not appear to affect the behavior of
the spout:
config.put(topology.spout.wait.strategy,
backtype.storm.spout.SleepSpoutWaitStrategy);
One example is if you configure each spout to scan different files or
directories.
On May 13, 2014 4:28 AM, Komal Thombare komal.thomb...@tcs.com wrote:
Hi all,
I am new to storm and working on Storm word count. I have confusion while
setting spout parallelism.
I am using
The number of spout and bolt executors can be changed at run time, but not
the number of tasks. The number of worker processes can also be changed.
You cannot change the layout of the topology by adding or removing spouts
or bolts. If you need to add data sources without down time one thing you
a couple thoughts
1) IBM streams is certainly more mature, as it's been in development for a
longer amount of time and storm is not even at release 1.0 yet. Though I
am not familiar with SPL, It would also make sense that it's faster to
implement as it is a higher level abstraction.
2) Operator
The number of executors and workers can be changed but the number of tasks
is fixed at topology creation time.
On May 6, 2014 1:40 AM, M.Tarkeshwar Rao tarkeshwa...@gmail.com wrote:
Hi,
Can i increase the parallelism of the topology as per traffic?
I need inputs from you all. any link?
You are creating your file writer with append set to true. It's it possible
your topology was run more than once?
On May 6, 2014 6:39 AM, Bilal Al Fartakh alfartaj.bi...@gmail.com wrote:
I'm using a bolt that receives tuples from another bolt (exclamation bolt
) and writes it on a file , the
));
}
2014-05-06 12:35 GMT+01:00 Nathan Leung ncle...@gmail.com:
You are creating your file writer with append set to true. It's it
possible your topology was run more than once?
On May 6, 2014 6:39 AM, Bilal Al Fartakh alfartaj.bi...@gmail.com
wrote:
I'm using a bolt that receives tuples from
declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields(word));
}
2014-05-06 12:35 GMT+01:00 Nathan Leung ncle...@gmail.com:
You are creating your file writer with append set to true. It's it
possible your topology was run more than once?
On May 6, 2014 6:39
the call to
“nextTuple()” on spout and everything else will continue to work as is ?
3. Initiate Shutdown...
- Thanks,
Prasun Ghosh
Apple Inc.
Information Security
On May 1, 2014, at 12:27 PM, Nathan Leung ncle...@gmail.com wrote:
You can deactivate the topology, which will shut off
It depends on your specific application, but if you are modifying specific
rows based on an index you can do fields grouping by that index so that
only one bolt will ever update a particular row.
On Mon, Apr 28, 2014 at 6:40 PM, Raphael Hsieh raffihs...@gmail.com wrote:
How does Storm handle
/downloads.html
http://storm-project.net/downloads.html*
I don't know whats missing...
Thanks!
2014-04-02 15:05 GMT-03:00 Nathan Leung ncle...@gmail.com:
No, it creates an extra executor to deal with processing the ack
messages that are sent by the bolts after processing tuples. See
, Apr 3, 2014 at 10:34 AM, Nathan Leung ncle...@gmail.com wrote:
by default each task is executed by 1 executor, but if the number of
tasks is greater than the number of executors, then each executor (thread)
will execute more than one task. Note that when rebalancing a topology,
you can change
...@gmail.com wrote:
Thanks. Since inside an executor, multiple tasks are in fact for the same
spout or bolt, is this feature of multiple tasks only useful for some
special cases?
On Thu, Apr 3, 2014 at 10:52 AM, Nathan Leung ncle...@gmail.com wrote:
tasks are run serially by the executor
. Is that a bug in
displaying topology summary?
My cluster consists of 2 supervisors and each has 4 workers defined.
Thanks.
On Tue, Apr 1, 2014 at 1:43 PM, Nathan Leung ncle...@gmail.com wrote:
By default supervisor nodes can run up to 4 workers. This is
configurable in storm.yaml
to deal with the tuples?
Thanks,
Huiliang
On Wed, Apr 2, 2014 at 8:31 AM, Nathan Leung ncle...@gmail.com wrote:
the extra task/executor is the acker thread.
On Tue, Apr 1, 2014 at 9:23 PM, Huiliang Zhang zhl...@gmail.com wrote:
I just submitted ExclamationTopology for testing
, Mar 29, 2014 at 6:34 AM, Susheel Kumar Gadalay
skgada...@gmail.com wrote:
No, a single worker is dedicated to a single topology no matter how
many threads it spawns for different bolts/spouts.
A single worker cannot be shared across multiple topologies.
On 3/29/14, Nathan Leung ncle
Kumar Gadalay
skgada...@gmail.com wrote:
No, a single worker is dedicated to a single topology no matter how
many threads it spawns for different bolts/spouts.
A single worker cannot be shared across multiple topologies.
On 3/29/14, Nathan Leung ncle...@gmail.com wrote:
From what I have seen
From what I have seen, the second topology is run with 1 worker until you
kill the first topology or add more worker slots to your cluster.
On Sat, Mar 29, 2014 at 2:57 AM, Huiliang Zhang zhl...@gmail.com wrote:
Thanks. I am still not clear.
Do you mean that in a single worker process, there
In my experience storm is able to make good use of CPU resources, if the
application is written appropriately. You shouldn't require too much
executor parallelism if your application is CPU intensive. If your bolts
are doing things like remote DB/NoSQL accesses, then that changes things
and
constrain
myself to a maximum that equates to the number of cores.
D
*From:* Nathan Leung ncle...@gmail.com
*Sent:* Tuesday, 18 March 2014 18:38
*To:* user@storm.incubator.apache.org
In my experience storm is able to make good use of CPU resources, if the
application
You can do something like
BoltDeclarer bd = builder.setBolt(myBoltB, boltB, boltBparallelism);
for (int i = 1; i numInstances; ++i) {
bd.shuffleGrouping(myBoltA + i);
}
On Mon, Mar 10, 2014 at 11:52 AM, Susana González susan...@gmail.comwrote:
Hi,
I need help to go from a simple Storm
reply. So I assume you still need to upload a new jar for
each topology? How are you handling this?
On Tue, Mar 4, 2014 at 6:34 PM, Nathan Leung ncle...@gmail.com wrote:
It is possible, but I don't think it is out of the box. You would have
to write that layer yourself. For example, I've
Hi Richards,
Thanks for the tip. I was actually planning on doing that as well. I read
more on kryo and found that it does not support registering a super type or
interface, so it makes sense that there is no serializer registered for the
map interface.
-Nathan
On Mar 2, 2014 2:50 AM, Richards
Hi,
I noticed on the page about serialization (
https://github.com/nathanmarz/storm/wiki/Serialization) that storm only
registers HashMap and HashSet for serialization with Kryo. The reason I
noticed is I noticed that in some cases when I have a class that contains a
member of type Map, storm
It can work with ZMQ, but you MUST use the version specified (2.1.7).
Newer versions change the API which causes errors, which might be what you
are seeing. Is the version of libzmq you installed the same as the one you
are using in production?
On Fri, Jan 31, 2014 at 9:47 AM, Mark Greene
at 10:32 AM, Nathan Leung ncle...@gmail.com wrote:
Assuming everything else is setup, configure your storm.yaml and then just
run the storm jar command. Assuming it's not, these two links are good
references:
https://github.com/nathanmarz/storm/wiki/Setting-up-a-Storm-cluster
http
I've benched storm at 1.8 million tuples per second on a big (24 core) box
using local or shuffle grouping between a spout and bolt. If you're only
seeing 10 events per second make sure you don't have any sleeps (whether in
your code or elsewhere e.g. a library or triggered due to lack of data in
Thanks,
-b
On 01/03/2014 03:06 PM, Nathan Leung wrote:
can you try storm 0.9.0.1 with zmq 2.1.7 and frozen jzmq? assuming that
it doesn't work, can you try with netty as your transport? It would seem
to me that something is wrong with your zmq library,
On Fri, Jan 3, 2014 at 2:58 PM
, Dec 31, 2013 at 3:06 PM, Nathan Leung ncle...@gmail.com wrote:
Do you have a file storm-starter.jar? This doesn't match the snapshot
naming that maven created for the with dependencies jar file. Also did
you do mvn install?
-Nathan
On Dec 31, 2013 12:55 AM, researcher cs prog.researc
Yes
On Dec 31, 2013 2:58 PM, researcher cs prog.researc...@gmail.com wrote:
Nathan , Do you mean by jar -tf for searching WordCountTopology in
storm-starter.jar or what ... ?
On Tue, Dec 31, 2013 at 9:24 PM, Nathan Leung ncle...@gmail.com wrote:
If the jar file is in fact in the location
You are using the sentence as the message ID? The word count example
repeats sentences, and your message IDs need to be unique.
On Mon, Dec 30, 2013 at 2:05 AM, Michal Singer mic...@leadspace.com wrote:
In my test I am using the word counter that was in the code samples of
storm starter.
Do you have enough worker slots in your cluster to run your topology? Are
there error messages in the worker logs?
On Thu, Dec 26, 2013 at 9:45 AM, Cheng Xuntao chengxun...@gmail.com wrote:
Hi, All,
I installed Graphite with Storm on my 10-node cluster. After I submitted
the fat jar, I
65 matches
Mail list logo