Re: Configuration changes and storm cluster

2014-09-25 Thread Nathan Leung
Gunderson *From:* Nathan Leung [mailto:ncle...@gmail.com] *Sent:* Wednesday, September 24, 2014 8:09 AM *To:* user *Subject:* Re: Configuration changes and storm cluster In my experience storm.yaml changes require daemon restarts, while cluster.xml changes get picked up dynamically by running

Re: Configuration changes and storm cluster

2014-09-24 Thread Nathan Leung
In my experience storm.yaml changes require daemon restarts, while cluster.xml changes get picked up dynamically by running topologies. On Wed, Sep 24, 2014 at 2:44 AM, Richards Peter hbkricha...@gmail.com wrote: Answers inline. Regards, Richards Peter. On Tue, Sep 23, 2014 at 8:55 PM,

Re: storm bolts receiving tuples with null values

2014-09-19 Thread Nathan Leung
What does the code for your kryo serializer look like? Are you sure that it is not returning null? Kryo will only be used to serialize if your tuple is crossing worker boundaries; when you have 1 worker everything is more or less passed by reference (through some queues and whatnot, but it does

Re: TOPOLOGY_MAX_SPOUT_PENDING working only when the spout emits anchored tuples?

2014-09-10 Thread Nathan Leung
Yes, it works only with anchored tuples. If the tuple is un anchored there is no way for the spout to know when it's been fully processed. On Sep 10, 2014 4:11 AM, Spico Florin spicoflo...@gmail.com wrote: Hello! I would like to know if the set up for TOPOLOGY_MAX_SPOUT_PENDING will be

Re: Remote Bolts unable to Ack messages

2014-09-10 Thread Nathan Leung
The exception you're seeing is an issue that was fixed in the latest storm release (0.9.2-incubating): https://issues.apache.org/jira/browse/STORM-187 If you are required to use an older version, one thing you can do is replace netty with 0mq as discussed here:

Re: When does nimbus rebalance a topology?

2014-09-09 Thread Nathan Leung
You can also print GC details to log (the following example is verbose but you can tailor it to your needs): -Xloggc:/opt/storm/logs/gc-worker-%ID%.log -verbose:gc -XX:GCLogFileSize=10m -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:+PrintGCDetails -XX:+PrintHeapAtGC

Execute latency much higher than process latency

2014-08-24 Thread Nathan Leung
Hi, Does anyone know what might cause the execute latency of a bolt to be much higher than the process latency? I've heard that code after the ack() method is called can cause this, but ack() is literally the last part of my execute() method. Also, sometimes this happens only on some instances

Re: Cannot create run the topology due to java.io.NotSerializableException: java.util.concurrent.CountDownLatch

2014-08-07 Thread Nathan Leung
Most things your spout or bolt uses, especially anything using a network connection, open file, etc, should be created in the prepare() method, and not on construction. On Thu, Aug 7, 2014 at 10:43 AM, Spico Florin spicoflo...@gmail.com wrote: Hello! I have a bolt that is using a third

Re: Task colocation in the same JVM or same node

2014-08-06 Thread Nathan Leung
You would need to design a custom scheduler: http://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/ On Wed, Aug 6, 2014 at 5:08 AM, Spico Florin spicoflo...@gmail.com wrote: Hello! I have a use case where I need that two bolts should be colocated either on

Re: Task colocation in the same JVM or same node

2014-08-06 Thread Nathan Leung
the reference to the plugable scheduler is pinting out to a github page that doesn't exist. Do you know if there is an updated documentation about this subject? Thanks in advance. Florin On Wed, Aug 6, 2014 at 3:16 PM, Nathan Leung ncle...@gmail.com wrote: You would need to design a custom

Re: The Parallelism of a bolt

2014-07-31 Thread Nathan Leung
Are you emitting in a separate thread? If so, yes. If not, no. On Jul 31, 2014 7:45 AM, 唐思成 jadetan...@qq.com wrote: Recently, I came across a question. Suppose I have a bolt withing a ArrayList as its private field, every time this bolt receives a tuple, it store the incoming tuple into this

Re: FW: why complete latency and failure rate is so high of my spout.

2014-07-30 Thread Nathan Leung
Complete latency can be higher if you have a lot of data pending from the spout (your topology.max.spout.pending is not set), however given the low number of tuples I wouldn't suspect this to be the case. What is the layout of your topology? Is it circular in any way? On Wed, Jul 30, 2014 at

Re: why complete latency and failure rate is so high of my spout.

2014-07-30 Thread Nathan Leung
The topology.max.spout.pending configures how many messages can be un-acked from each spout before it stops sending messages. So in your example, each spout task can have 10 thousand messages waiting to be acked before it throttle itself and stops emitting. Of course if some of those messages

Re: Parallelism for KafkaSpout

2014-07-23 Thread Nathan Leung
In your example, five spouts would get data and the other five would not. On Jul 23, 2014 5:11 PM, Kashyap Mhaisekar kashya...@gmail.com wrote: Hi, Is the no. of executors for KafkaSpout dependent on the partitions for the topic? For E.g., Say kafka TopicA has 5 partitions. If I have a

RE: could assign Topology execute in some supervisor

2014-07-17 Thread Nathan Leung
Yes requesting 55 workers when you only have 14 is not going to work so well. Why do you set number of workers so high for your topology? On Jul 17, 2014 8:14 AM, chenlax lax...@hotmail.com wrote: @Nathan Leung,i think i know why isolation scheduler unwork in my demo. in the topology i set

Re: Vertica Storm Error

2014-07-11 Thread Nathan Leung
As Vladi says, you should initialize the connection in prepare(), not in the constructor. Any objects that are created in the constructor are serialized and sent to the nimbus, so you will get this exception if you create anything that is not serializable in your spout/bolt constructors. On

Re: could assign Topology execute in some supervisor

2014-07-10 Thread Nathan Leung
Your yaml doesn't appear to be properly formatted. Either that, or your email client stripped some characters. On Thu, Jul 10, 2014 at 4:16 AM, chenlax lax...@hotmail.com wrote: and submit other topology also can't get works. Thanks, Lax -- From:

Re: Storm topology consumes 100% of memory

2014-07-09 Thread Nathan Leung
As Jungtaek said, the issue is 0mq memory usage. 0mq library code is accessed through jni so its memory usage is not governed by the jvm. By default there is no high water mark so your memory usage can explode. If you aren't using reliable message handling (emitting tuples with an id) and setting

RE: could assign Topology execute in some supervisor

2014-07-08 Thread Nathan Leung
Does the isolation scheduler suit your needs? https://storm.incubator.apache.org/2013/01/11/storm082-released.html On Jul 8, 2014 8:53 AM, chenlax lax...@hotmail.com wrote: i means assign 2 supervisor machines execute a topology,the worker number more than 2.maybe 10 or more. Thanks, Lax

Re: FW: storm-kafka integration duplication of messages

2014-07-07 Thread Nathan Leung
Your data is replicated because of all grouping. I assume you have more than kafka and hdfs bolt. If you send via fields grouping (or shuffle grouping, or local or shuffle grouping), and both are subscribed to your deserializer, then one task in hdfs bolt and one task in kafka bolt will get the

RE: storm-kafka integration duplication of messages

2014-07-07 Thread Nathan Leung
Are you using all grouping for the deserialization bolt too? My original point is that you should not be using this grouping anywhere in your topology. It will cause the duplication you are seeing unless your parallelism is set to one for your bolts. See the section on groupings here

Re: Bolt execute() concurrency ?

2014-07-01 Thread Nathan Leung
Each task has its own instance of the bolt class. On Tue, Jul 1, 2014 at 7:31 PM, Pasquini, Reuben reuben.pasqu...@hp.com wrote: Hi, Can the execute() method on a single instance of a Bolt execute concurrently on multiple threads ? Or does each task get its own instance of a particular

Re: How to update a running Storm Topology

2014-06-18 Thread Nathan Leung
If you kill a topology in the ui, you will notice that sometimes it takes awhile for it to clear and go away. If you try to reload the topology during this time you will get the same exception. You should loop checking the nimbus for this topology after you kill it, and only reload after you

Re: using thrift api

2014-06-18 Thread Nathan Leung
A rough overview since I don't know if I can share code 1) create a thrift connection 2) get a Nimbus.Client object (I will call this 'client') 3) call client.getTopology(topology id) - returns StormTopology, I will call this 'topology' 4) Iterate MapString, SpoutSpec that is returned by

Re: Implications of running multiple topologies without isolation

2014-06-06 Thread Nathan Leung
1) a worker can spawn any number of threads, so you can possibly run into standard shared resources issues (CPU, network, disk, etc). RAM is not as big of a problem since each worker gets a fixed amount. 2) a worker is spawned for a particular topology; it only execute spout/bolt tasks for the

Re: Storm performance

2014-05-31 Thread Nathan Leung
If you are GCing too much and failing a lot of tuples (which may be in part due to GCs) it is quite possible that you are out of RAM and you should increase the amount that is allocated for each worker. On Sat, May 31, 2014 at 9:25 PM, Srinath C srinat...@gmail.com wrote: Hi Shaikh, You

Re: All tuples are going to same worker

2014-05-29 Thread Nathan Leung
to a specific bolt (Tweet/retweet/replytweet). Here it will insert data into HBase and forward the same tuple to user bolt for further processing. 4. User bolt will insert data into HBase database. and so on. Thanks Regards, Riyaz On Thu, May 29, 2014 at 3:57 PM, Nathan Leung ncle

Re: Python bolt writing to files

2014-05-28 Thread Nathan Leung
Are you sure the bolt ran properly? If so, on which machine are you looking for the file? On Wed, May 28, 2014 at 4:30 PM, Dilpreet Singh dilpreet...@gmail.comwrote: Hi, I'm writing the output of a python bolt to a file by: f = open('/tmp/clusters.txt', 'a')

Re: Sometimes topology crashed with internal exception

2014-05-21 Thread Nathan Leung
Are you sure that you're not passing a null value in your tuple? On May 21, 2014 7:25 AM, Irek Khasyanov qua...@gmail.com wrote: Hello. I have strange problem with by topology, sometimes everything crashed with exception: java.lang.RuntimeException: java.lang.NullPointerException at

Re: Sometimes topology crashed with internal exception

2014-05-21 Thread Nathan Leung
You can also synchronize access to the OutputCollector so that only 1 thread is using it at a time. On Wed, May 21, 2014 at 12:39 PM, Irek Khasyanov qua...@gmail.com wrote: Hm, yes, firs bolt emitting from different thread, I did't realize that this will be problem. Thanks! I'll try to change

Re: logging gc event

2014-05-20 Thread Nathan Leung
You can do something like this: -Xloggc:Your Storm Install Dir/logs/gc-worker-%ID%.log On Wed, May 14, 2014 at 2:01 PM, Sean Allen s...@monkeysnatchbanana.comwrote: is anyone logging gc events for workers in their cluster? outside of the storm, the following jvm options are pretty standard

Re: Kafka Spout 0.8-plus stops consuming messages after a while

2014-05-20 Thread Nathan Leung
Hi Jing, Was message.max.bytes changed in your Kafka server config to be higher than the default value (100 bytes)? -Nathan On Mon, May 19, 2014 at 5:54 PM, Tao, Jing j...@webmd.net wrote: I finally found the root cause. Turns out the spout was reading a message exceeded the max

Re: Storm Scaling Issues

2014-05-16 Thread Nathan Leung
value of the throughput measuring bolt in Storm UI is at around ~0.12. Will try out more configurations and see. Thank you very much for your tips. Any other tweaks I might try out? Thanks, Lasantha On Tue, May 13, 2014 at 6:38 PM, Nathan Leung ncle...@gmail.com wrote: For 20 spouts

Spout sleep wait strategy question

2014-05-14 Thread Nathan Leung
Hi, I am configuring spout sleep in my topology by adding the following items to my configuration map but it does not appear to affect the behavior of the spout: config.put(topology.spout.wait.strategy, backtype.storm.spout.SleepSpoutWaitStrategy);

Re: Setting spout parallelism

2014-05-13 Thread Nathan Leung
One example is if you configure each spout to scan different files or directories. On May 13, 2014 4:28 AM, Komal Thombare komal.thomb...@tcs.com wrote: Hi all, I am new to storm and working on Storm word count. I have confusion while setting spout parallelism. I am using

Re: How to change storm topology at runtime?

2014-05-13 Thread Nathan Leung
The number of spout and bolt executors can be changed at run time, but not the number of tasks. The number of worker processes can also be changed. You cannot change the layout of the topology by adding or removing spouts or bolts. If you need to add data sources without down time one thing you

Re: Interesting Comparison

2014-05-12 Thread Nathan Leung
a couple thoughts 1) IBM streams is certainly more mature, as it's been in development for a longer amount of time and storm is not even at release 1.0 yet. Though I am not familiar with SPL, It would also make sense that it's faster to implement as it is a higher level abstraction. 2) Operator

Re: Can i increase the parallelism of the topology as per traffic

2014-05-06 Thread Nathan Leung
The number of executors and workers can be changed but the number of tasks is fixed at topology creation time. On May 6, 2014 1:40 AM, M.Tarkeshwar Rao tarkeshwa...@gmail.com wrote: Hi, Can i increase the parallelism of the topology as per traffic? I need inputs from you all. any link?

Re: duplicated result

2014-05-06 Thread Nathan Leung
You are creating your file writer with append set to true. It's it possible your topology was run more than once? On May 6, 2014 6:39 AM, Bilal Al Fartakh alfartaj.bi...@gmail.com wrote: I'm using a bolt that receives tuples from another bolt (exclamation bolt ) and writes it on a file , the

Re: duplicated result

2014-05-06 Thread Nathan Leung
)); } 2014-05-06 12:35 GMT+01:00 Nathan Leung ncle...@gmail.com: You are creating your file writer with append set to true. It's it possible your topology was run more than once? On May 6, 2014 6:39 AM, Bilal Al Fartakh alfartaj.bi...@gmail.com wrote: I'm using a bolt that receives tuples from

Re: duplicated result

2014-05-06 Thread Nathan Leung
declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields(word)); } 2014-05-06 12:35 GMT+01:00 Nathan Leung ncle...@gmail.com: You are creating your file writer with append set to true. It's it possible your topology was run more than once? On May 6, 2014 6:39

Re: Best practice for shutting down storm

2014-05-01 Thread Nathan Leung
the call to “nextTuple()” on spout and everything else will continue to work as is ? 3. Initiate Shutdown... - Thanks, Prasun Ghosh  Apple Inc. Information Security On May 1, 2014, at 12:27 PM, Nathan Leung ncle...@gmail.com wrote: You can deactivate the topology, which will shut off

Re: Multiple writers to datastore ?

2014-04-28 Thread Nathan Leung
It depends on your specific application, but if you are modifying specific rows based on an index you can do fields grouping by that index so that only one bolt will ever update a particular row. On Mon, Apr 28, 2014 at 6:40 PM, Raphael Hsieh raffihs...@gmail.com wrote: How does Storm handle

Re: Basic storm question

2014-04-03 Thread Nathan Leung
/downloads.html http://storm-project.net/downloads.html* I don't know whats missing... Thanks! 2014-04-02 15:05 GMT-03:00 Nathan Leung ncle...@gmail.com: No, it creates an extra executor to deal with processing the ack messages that are sent by the bolts after processing tuples. See

Re: Basic storm question

2014-04-03 Thread Nathan Leung
, Apr 3, 2014 at 10:34 AM, Nathan Leung ncle...@gmail.com wrote: by default each task is executed by 1 executor, but if the number of tasks is greater than the number of executors, then each executor (thread) will execute more than one task. Note that when rebalancing a topology, you can change

Re: Basic storm question

2014-04-03 Thread Nathan Leung
...@gmail.com wrote: Thanks. Since inside an executor, multiple tasks are in fact for the same spout or bolt, is this feature of multiple tasks only useful for some special cases? On Thu, Apr 3, 2014 at 10:52 AM, Nathan Leung ncle...@gmail.com wrote: tasks are run serially by the executor

Re: Basic storm question

2014-04-02 Thread Nathan Leung
. Is that a bug in displaying topology summary? My cluster consists of 2 supervisors and each has 4 workers defined. Thanks. On Tue, Apr 1, 2014 at 1:43 PM, Nathan Leung ncle...@gmail.com wrote: By default supervisor nodes can run up to 4 workers. This is configurable in storm.yaml

Re: Basic storm question

2014-04-02 Thread Nathan Leung
to deal with the tuples? Thanks, Huiliang On Wed, Apr 2, 2014 at 8:31 AM, Nathan Leung ncle...@gmail.com wrote: the extra task/executor is the acker thread. On Tue, Apr 1, 2014 at 9:23 PM, Huiliang Zhang zhl...@gmail.com wrote: I just submitted ExclamationTopology for testing

Re: Basic storm question

2014-04-01 Thread Nathan Leung
, Mar 29, 2014 at 6:34 AM, Susheel Kumar Gadalay skgada...@gmail.com wrote: No, a single worker is dedicated to a single topology no matter how many threads it spawns for different bolts/spouts. A single worker cannot be shared across multiple topologies. On 3/29/14, Nathan Leung ncle

Re: Basic storm question

2014-04-01 Thread Nathan Leung
Kumar Gadalay skgada...@gmail.com wrote: No, a single worker is dedicated to a single topology no matter how many threads it spawns for different bolts/spouts. A single worker cannot be shared across multiple topologies. On 3/29/14, Nathan Leung ncle...@gmail.com wrote: From what I have seen

Re: Basic storm question

2014-03-29 Thread Nathan Leung
From what I have seen, the second topology is run with 1 worker until you kill the first topology or add more worker slots to your cluster. On Sat, Mar 29, 2014 at 2:57 AM, Huiliang Zhang zhl...@gmail.com wrote: Thanks. I am still not clear. Do you mean that in a single worker process, there

Re: Server load - Topology optimization

2014-03-18 Thread Nathan Leung
In my experience storm is able to make good use of CPU resources, if the application is written appropriately. You shouldn't require too much executor parallelism if your application is CPU intensive. If your bolts are doing things like remote DB/NoSQL accesses, then that changes things and

Re: Server load - Topology optimization

2014-03-18 Thread Nathan Leung
constrain myself to a maximum that equates to the number of cores. D *From:* Nathan Leung ncle...@gmail.com *Sent:* ‎Tuesday‎, ‎18‎ ‎March‎ ‎2014 ‎18‎:‎38 *To:* user@storm.incubator.apache.org In my experience storm is able to make good use of CPU resources, if the application

Re: How to define grouping in a Topology with an 'a priori' unknown number of streams to subscribe?

2014-03-10 Thread Nathan Leung
You can do something like BoltDeclarer bd = builder.setBolt(myBoltB, boltB, boltBparallelism); for (int i = 1; i numInstances; ++i) { bd.shuffleGrouping(myBoltA + i); } On Mon, Mar 10, 2014 at 11:52 AM, Susana González susan...@gmail.comwrote: Hi, I need help to go from a simple Storm

Re: Dynamic Topologies

2014-03-05 Thread Nathan Leung
reply. So I assume you still need to upload a new jar for each topology? How are you handling this? On Tue, Mar 4, 2014 at 6:34 PM, Nathan Leung ncle...@gmail.com wrote: It is possible, but I don't think it is out of the box. You would have to write that layer yourself. For example, I've

Re: Serializing Maps other than HashMap

2014-03-05 Thread Nathan Leung
Hi Richards, Thanks for the tip. I was actually planning on doing that as well. I read more on kryo and found that it does not support registering a super type or interface, so it makes sense that there is no serializer registered for the map interface. -Nathan On Mar 2, 2014 2:50 AM, Richards

Serializing Maps other than HashMap

2014-03-01 Thread Nathan Leung
Hi, I noticed on the page about serialization ( https://github.com/nathanmarz/storm/wiki/Serialization) that storm only registers HashMap and HashSet for serialization with Kryo. The reason I noticed is I noticed that in some cases when I have a class that contains a member of type Map, storm

Re: Topology dies immediately upon deployment when configured with two workers instead of one

2014-01-31 Thread Nathan Leung
It can work with ZMQ, but you MUST use the version specified (2.1.7). Newer versions change the API which causes errors, which might be what you are seeing. Is the version of libzmq you installed the same as the one you are using in production? On Fri, Jan 31, 2014 at 9:47 AM, Mark Greene

Re: 答复: how can i connect remote storm cluster and submit topology?

2014-01-12 Thread Nathan Leung
at 10:32 AM, Nathan Leung ncle...@gmail.com wrote: Assuming everything else is setup, configure your storm.yaml and then just run the storm jar command. Assuming it's not, these two links are good references: https://github.com/nathanmarz/storm/wiki/Setting-up-a-Storm-cluster http

Re: Storm Performance

2014-01-10 Thread Nathan Leung
I've benched storm at 1.8 million tuples per second on a big (24 core) box using local or shuffle grouping between a spout and bolt. If you're only seeing 10 events per second make sure you don't have any sleeps (whether in your code or elsewhere e.g. a library or triggered due to lack of data in

Re: Segfault in worker when submitting topology

2014-01-03 Thread Nathan Leung
Thanks, -b On 01/03/2014 03:06 PM, Nathan Leung wrote: can you try storm 0.9.0.1 with zmq 2.1.7 and frozen jzmq? assuming that it doesn't work, can you try with netty as your transport? It would seem to me that something is wrong with your zmq library, On Fri, Jan 3, 2014 at 2:58 PM

Re: Error in submitting Topology

2013-12-31 Thread Nathan Leung
, Dec 31, 2013 at 3:06 PM, Nathan Leung ncle...@gmail.com wrote: Do you have a file storm-starter.jar? This doesn't match the snapshot naming that maven created for the with dependencies jar file. Also did you do mvn install? -Nathan On Dec 31, 2013 12:55 AM, researcher cs prog.researc

Re: Error in submitting Topology

2013-12-31 Thread Nathan Leung
Yes On Dec 31, 2013 2:58 PM, researcher cs prog.researc...@gmail.com wrote: Nathan , Do you mean by jar -tf for searching WordCountTopology in storm-starter.jar or what ... ? On Tue, Dec 31, 2013 at 9:24 PM, Nathan Leung ncle...@gmail.com wrote: If the jar file is in fact in the location

Re: Guaranteeing message processing on strom fails

2013-12-30 Thread Nathan Leung
You are using the sentence as the message ID? The word count example repeats sentences, and your message IDs need to be unique. On Mon, Dec 30, 2013 at 2:05 AM, Michal Singer mic...@leadspace.com wrote: In my test I am using the word counter that was in the code samples of storm starter.

Re: Storm-Graphite works fine in local mode but not on clusters

2013-12-26 Thread Nathan Leung
Do you have enough worker slots in your cluster to run your topology? Are there error messages in the worker logs? On Thu, Dec 26, 2013 at 9:45 AM, Cheng Xuntao chengxun...@gmail.com wrote: Hi, All, I installed Graphite with Storm on my 10-node cluster. After I submitted the fat jar, I