Re: How long until fields grouping gets overwhelmed with data?

2016-08-10 Thread Nathan Leung
It's based on a modulo of a hash of the field. The fields grouping is stateless. On Aug 10, 2016 8:18 AM, "Navin Ipe" wrote: > Hi, > > For spouts to be able to continuously send a fields grouped tuple to the > same bolt, it would have to store a key value map something like this, > right? > > fi

Re: Is dynamic update of topologies possible?

2016-07-07 Thread Nathan Leung
It depends how your computation changes too. If it's something like changing models you can load them dynamically in a bolt and that will change its computation without a topology reload. On Jul 7, 2016 7:24 PM, "Jungtaek Lim" wrote: > Yeah sure that's possible. I thought "on the fly" means updat

Re: "Coalescing" tuples?

2016-07-05 Thread Nathan Leung
I would push the incoming tuples into a stack, and have a background thread take the stack, process the top element, and discard the rest. On Jul 5, 2016 10:33 AM, "Marco Nicolini" wrote: > Hello guys, > > I'm evaluating using storm for a project at work. > > I've just scratched the surface of st

Re: storm usage and design question

2016-07-04 Thread Nathan Leung
Double check how you are pushing data into Kafka. You are probably pushing one line at a time. On Jul 4, 2016 12:30 PM, "Navin Ipe" wrote: > I haven't worked with Kafka, *so perhaps someone else here would be able > to help you with it. * > What I could suggest though, is to search for how to emi

Re: Re: Problem to write into HBase

2016-06-12 Thread Nathan Leung
This is most likely an issue with process limits. Check your process limits in your supervisor nodes with "unlimited -a" and increase to 64k or so if it's low (e.g. 1k). Even if you aren't creating threads, storm is, and a lot of other libraries such as database client libraries do too. On Jun 12,

Re: not all the servers in storm cluster processing messages

2016-06-09 Thread Nathan Leung
If you use localOrShuffleGrouping, the tuple will stay in process if possible. The solution, assuming your bolts are distributed, is to use shuffleGrouping to send tuples from spout->bolt so that the tuples are distributed to all of your workers, then use localOrShuffleGrouping from bolt->bolt to

Re: Another parallelism question

2016-06-09 Thread Nathan Leung
on setNumTask ? > > 2016-06-09 15:20 GMT+02:00 Nathan Leung : > >> You can create your topology with more tasks than executors, then when >> the rebalance happens you can add executors. However at the moment you >> cannot add more tasks to a running topology. >> >

Re: Another parallelism question

2016-06-09 Thread Nathan Leung
You can create your topology with more tasks than executors, then when the rebalance happens you can add executors. However at the moment you cannot add more tasks to a running topology. On Thu, Jun 9, 2016 at 8:58 AM, Adrien Carreira wrote: > I've just create a topology like this : > > builder

Re: Fwd: Apache Flux - multiple output streams in YAML

2016-05-17 Thread Nathan Leung
FluxShellSpout extends ShelllSpout so it can use streams with ShellSpout's stream mechanism. You just define which stream to use in the streams section and set the stream in your ShellMsg. https://github.com/apache/storm/blob/1.0.x-branch/storm-core/src/jvm/org/apache/storm/spout/ShellSpout.java O

Re: Understanding parallelism in Storm

2016-05-16 Thread Nathan Leung
I would reread Michael Noll's blog post. In my opinion it's pretty clear. With regards to BoltParallelism and BoltTaskParallelism you have it backwards. Your BoltTaskParallelism is # tasks and BoltParalllelism is # executors. This is made clear in the section "Configuring the parallelism of a t

Re: Understanding parallelism in Storm

2016-05-16 Thread Nathan Leung
> What could be the difference ? I've the same hint of parallelism, so I > don't understand why they is some difference beetween both cases. > > Also, I've read that multiple worker on same machine use network to > communicate... > > > > 2016-05-16 14:33 G

Re: Understanding parallelism in Storm

2016-05-16 Thread Nathan Leung
The number of tasks is the number of spout objects that get created, that each have their own distinct sets of tuples that are emitted, need to be acked, etc. The number of executors is the number of OS threads (potentially across more than 1 machine) that get created to service these spout object

Re: Split Kafka JSON String

2016-05-15 Thread Nathan Leung
Map your json to a pojo. Sorry auto correct got me. On May 15, 2016 2:04 PM, "Nathan Leung" wrote: > Easiest way is to map your job to a pojo and use Jackson or gson to > convert the json. > On May 15, 2016 1:48 PM, "Daniela S" wrote: > >> Hi >> >

Re: Split Kafka JSON String

2016-05-15 Thread Nathan Leung
Easiest way is to map your job to a pojo and use Jackson or gson to convert the json. On May 15, 2016 1:48 PM, "Daniela S" wrote: > Hi > > I am receiving Strings of JSON from Kafka. I would like to split the > string to get each field from the JSON object to store it into Redis. > How can I split

Re: Load Balancing in Storm

2016-05-13 Thread Nathan Leung
No you cannot. If you have more tasks than executors you can rebalance. You can also rebalance to increase the number of workers. But you need to have enough tasks when the topology is started to accommodate the new executors. On Fri, May 13, 2016 at 9:52 AM, Gharaibeh, Ammar wrote: > Hi > > I

Re: Getting Kafka Offset in Storm Bolt

2016-05-11 Thread Nathan Leung
; that chunk is uploaded, all of them can be acked. But isn't it overkill ? I > guess storm is not even meant to support that kind of a use case. > > On Wed, May 11, 2016 at 12:59 PM, Nathan Leung wrote: > >> You can micro batch kafka contents into a file that's replica

Re: Getting Kafka Offset in Storm Bolt

2016-05-11 Thread Nathan Leung
. > > On Wed, May 11, 2016 at 11:10 AM, Nathan Leung wrote: > >> Why not just ack the tuple once it's been written to a file. If your >> topology fails then the data will be re-read from Kafka. Kafka spout >> already does this for you. Then uploading files to S3 is t

Re: Getting Kafka Offset in Storm Bolt

2016-05-11 Thread Nathan Leung
Why not just ack the tuple once it's been written to a file. If your topology fails then the data will be re-read from Kafka. Kafka spout already does this for you. Then uploading files to S3 is the responsibility of another job. For example, a storm topology that monitors the output folder. M

Re: How to let a topology know that it's time to stop?

2016-05-08 Thread Nathan Leung
Alternative is to use a control message on a separate stream that goes to all bolt tasks using all grouping. On May 8, 2016 3:20 PM, "Matthias J. Sax" wrote: > To synchronize this, use an additional "shut down bolt" that used > parallelism of one. "shut down bolt" must be notified by all parallel

Re: Slots vs. Topology

2016-04-28 Thread Nathan Leung
you should > configure per supervisor. > > Hope this helps. > > > -Matthias > > > On 04/28/2016 04:01 PM, Nathan Leung wrote: > > I would recommend against this. Storm will automatically run multiple > > threads for you, especially if you have more than 1 exe

Re: Slots vs. Topology

2016-04-28 Thread Nathan Leung
I would recommend against this. Storm will automatically run multiple threads for you, especially if you have more than 1 executor / worker. Every time data transfers between workers, it must be serialized and deserialized. On the other hand, if you have larger workers and one goes down, your top

RE: Emit from a Java application and receive in a Bolt of another Java application?

2016-04-26 Thread Nathan Leung
Kafka is probably better than redis, and definitely better than sockets. Other queues like rabbitmq can work too. Sockets are a terrible choice, but I will leave why as a mental exercise for now :). If you must know I can respond again. On Apr 26, 2016 6:27 AM, "Cody Lee" wrote: 2 options come

Re: Does Storm use Netty to access a class reference across worker JVM's?

2016-04-22 Thread Nathan Leung
> Followed up on serialization, and it looks like everything can be > serialized: http://stackoverflow.com/a/16851174/453673 > Will verify the database connection serialization also during > implementation. > > On Thu, Apr 21, 2016 at 5:22 PM, Nathan Leung wrote: > >> mon

Re: Does Storm use Netty to access a class reference across worker JVM's?

2016-04-21 Thread Nathan Leung
mongoManager is serialized and sent to your spout. If it's not something that's easily serializable (e.g. a database connection) then you will need to initialize it in spout prepare() instead of the constructor. On Thu, Apr 21, 2016 at 7:34 AM, Navin Ipe wrote: > Thanks John, but that's odd...i

Re: How to set number of slots in storm.yaml

2016-04-12 Thread Nathan Leung
I would stick with 4 and change it only if you have a good reason to do so. Too many worker processes can be detrimental. Each storm process is by nature highly threaded. There's not much benefit to making so many processes, unless you need really fine grained fault tolerance. See also: https:/

Re: Consequences of non-acking stateless bolt in stateful topology

2016-03-09 Thread Nathan Leung
Unless you have abnormally high tuple timeout they will timeout and be replayed. Anyways, if you bolt is stateless how would it delay acking by an hour? On Wed, Mar 9, 2016 at 8:14 AM, Alexander T wrote: > Hello! > > Regarding https://issues.apache.org/jira/browse/STORM-1608, what would > happe

Re: Token awareness of storm bolts

2016-03-08 Thread Nathan Leung
No. Hash function can be applied to any object, assuming it has a suitable implementation defined, so it is impossible to define a "range". Also, that's not how hash functions work. On Mar 7, 2016 10:29 PM, "Gireesh Ramji" wrote: > In my topology, I have a bolt that subscribes from its predecesso

Re: ExclamationTopology workers executors vs tasks

2016-03-01 Thread Nathan Leung
Each worker process has an ack bolt. On Mar 1, 2016 6:34 AM, "patcharee" wrote: > Hi, > > I am new to Storm. I am running the storm starter example > ExclamationTopology on storm cluster (version 0.10). The code snippet is > below: > > TopologyBuilder builder = new TopologyBuilder(); > > builder.

Re: Updating a given bolt in a live storm cluster

2016-02-22 Thread Nathan Leung
not something related to Storm > configuration. I'm happy to learn in case you meant something else. > > On Tue, Feb 23, 2016 at 12:34 AM, Nathan Leung wrote: > >> This isn't part of core storm functionality. You can code your bolts >> such that they refresh conf

Re: Updating a given bolt in a live storm cluster

2016-02-22 Thread Nathan Leung
This isn't part of core storm functionality. You can code your bolts such that they refresh configuration periodically, but I'm not sure if that covers your use case. On Mon, Feb 22, 2016 at 4:39 PM, Matan Safriel wrote: > Hi, > > I apologize if this has been answered before, but unlike google

Re: Utilization imbalance in Storm Bolt Executors

2016-02-22 Thread Nathan Leung
How many spouts do you have? On Mon, Feb 22, 2016 at 10:09 AM, Muhammad Bilal wrote: > Hi, > > I am running the RollingCount Benchmark from this set of benchmarks > . Here is the relevant > piece of code: > > spout = new FileReadSpout(BenchmarkUti

Re: localOrShuffleGrouping load balanced tuple distribution

2016-02-16 Thread Nathan Leung
we increase the number of workers to 5, then it will behave as > shuffleGrouping (distributing evenly among the bolts). > I got it! > Regards, > Florin > > On Mon, Feb 15, 2016 at 3:56 PM, Nathan Leung wrote: > >> It will use the local bolts. >> >> On Mon,

Re: localOrShuffleGrouping load balanced tuple distribution

2016-02-15 Thread Nathan Leung
It will use the local bolts. On Mon, Feb 15, 2016 at 8:32 AM, Spico Florin wrote: > > Hello! >Suppose that I have the following scenario > 1. One spout > 2. One bolt with hintParallelism set to 4 > 3. Bolt connected with the spout with localOrShuffleGrouping > 4. 2 slots available > 5. We us

Re: Need Help regarding topology with numWorker>1

2016-02-11 Thread Nathan Leung
Any situation where you require more CPU than 1 server can provide for you - there are tuning parameters (e.g. localOrShuffleGrouping) that you can use to reduce the amount of data sent over the network too. Any situation where you need to have tolerance in case of machine failure. On Thu, Feb

Re: how to execute one bolt after another when the input is taken from same spout.

2016-02-02 Thread Nathan Leung
aking input from the spout itself and the two bolts contains > different logic so i cannot emit tuple from the first bolt. > > On Mon, Feb 1, 2016 at 11:17 PM, Nathan Leung wrote: > >> You should wire the bolts one after the other, and the first will emit >> the tuple

Re: how to execute one bolt after another when the input is taken from same spout.

2016-02-01 Thread Nathan Leung
You should wire the bolts one after the other, and the first will emit the tuple to the second only when it has to. Don't use sleep, that's probably not correct anyways. On Jan 31, 2016 11:22 PM, "sujitha chinnu" wrote: > hai., > My requirement is to first execute one bolt and upon succe

Re: Multiple output streams: all go through the same connection?

2016-01-06 Thread Nathan Leung
Look at http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/. Pretty sure it's using one port for all of the communications. Also ZMQ is not a queue, especially not a brokered one, and storm uses netty by default in the latest versions. On Wed, Jan 6, 2016 at

Re: [Discussion] storm local-mode event object reuse bug

2015-12-01 Thread Nathan Leung
It is bypassed by design. As noted in https://storm.apache.org/apidocs/backtype/storm/task/OutputCollector.html, the emitted objects must be immutable. If you're intent on modifying them, be very careful. On Tue, Dec 1, 2015 at 4:28 AM, Stephen Powis wrote: > I believe anytime tuples are passe

Re: How to handle dynamic topologies

2015-11-05 Thread Nathan Leung
e that passing through > the output collector is faster than Kaka, what do you think ? > > Thank you, > > Crina > > 2015-11-05 12:36 GMT+01:00 Nathan Leung : > >> It's not possible to combine several topologies into one, but it should >> be possible to write dif

Re: How to handle dynamic topologies

2015-11-05 Thread Nathan Leung
It's not possible to combine several topologies into one, but it should be possible to write different tuple sinks such that you can configure each bolt to write to either the output collector or Kafka. Then it's just a matter of wiring and configuring your bolts differently. You can use something

Re: Emitting tuple to upstream node using direct grouping

2015-11-04 Thread Nathan Leung
You would need: Spout-1 --(direct-grouping)--> Bolt-1 --(direct-grouping)--> Bolt-2 --(direct grouping, non-default stream)--> Bolt 1 Task IDs are numbered from 0 (pretty sure it's 0, if not it's from 1) for each component. Therefore Spout 1 task IDs are 0 to (n-1), Bolt 1 task IDs are 0 to (m-1

Re: storm noob question - understanding ingestion

2015-11-02 Thread Nathan Leung
A spout can get its own task ID, and also the number of tasks running for the spout (see https://nathanmarz.github.io/storm/doc/backtype/storm/task/TopologyContext.html for details). You can use this data to assign sections of the file, if that's what you really want. Other options include sendin

Re: Strange storm problem

2015-11-01 Thread Nathan Leung
minor gc is quite > low, which happens once every few seconds. > > On Mon, Nov 2, 2015 at 12:29 AM, Nathan Leung wrote: > >> The box with no throughput might be in a gc loop. Check your heap >> utilization and maybe increase worker heap if necessary. Also consider >&g

Re: Strange storm problem

2015-11-01 Thread Nathan Leung
The box with no throughput might be in a gc loop. Check your heap utilization and maybe increase worker heap if necessary. Also consider decreasing the max spout pending, even without further details 20k seems high. On Nov 1, 2015 10:50 AM, "Harsha" wrote: > Do you have any calls to external data

Re: Flux question

2015-10-02 Thread Nathan Leung
With regards to the first question, the purpose of flux is to dynamically create a topology for submission to nimbus. It serves to replace the topology main() method. It's is suitable for setting up configuration you use for connecting in prepare(), but it's not suitable for creating the connecti

Re: Storm Memory Consumption Issue

2015-09-12 Thread Nathan Leung
I would try making lots of bolt 2 in parallel and have each bolt 2 process the full list of rules. On Sep 12, 2015 10:37 AM, "Hong Wind" wrote: > Hi all again, > > Since no reply I guess the question might be not described clearly enough. > So let me try to elaborate it ... > > Performance requir

Re: Expiring session 0x14f4657e8dd001b, timeout of 20000ms exceeded

2015-08-21 Thread Nathan Leung
nd SWAP area is Approx. 28GB. And As > per basic calculation, I need 26GB memory. I have run this topology in > local cluster mode. > > > On Fri, Aug 21, 2015 at 12:18 AM, Nathan Leung wrote: > >> Prepare shouldn't cause a timeout. When I said gc I meant something l

Re: Expiring session 0x14f4657e8dd001b, timeout of 20000ms exceeded

2015-08-20 Thread Nathan Leung
n. So what are solutions for such problem. > > On Thu, Aug 20, 2015 at 4:19 PM, Nathan Leung wrote: > >> Do you have long running gc? I've seen this cause zk connection loss. >> On Aug 20, 2015 2:30 AM, "swapnil joshi" >> wrote: >> >>> Thanks

Re: Expiring session 0x14f4657e8dd001b, timeout of 20000ms exceeded

2015-08-20 Thread Nathan Leung
Do you have long running gc? I've seen this cause zk connection loss. On Aug 20, 2015 2:30 AM, "swapnil joshi" wrote: > Thanks!!! For giving me response. I had change configuration in storm.yml > file. But still I got following error > > *15/08/20 11:44:17 ERROR imps.CuratorFrameworkImpl: Backgro

Re: Multiple Streams vs Multiple Subscribers of the same stream

2015-08-07 Thread Nathan Leung
It's even worse, you have information for both bolts sent twice, instead of information for one bolt sent once, so assuming same message size and same frequency of messages for both bolts you are sending 4x data. Use option 2. On Aug 7, 2015 1:18 PM, "Kishore Senji" wrote: > I also think option

Re: Max Spout Pending - Question

2015-07-29 Thread Nathan Leung
he same message and call topology. > > Thanks > Kashyap > > On Wed, Jul 29, 2015 at 11:59 AM, Nathan Leung wrote: > >> 1 second is too short. Spout latency includes time spent in the output >> queue from the spout (increasing max spout pending potentially increases >

Re: Max Spout Pending - Question

2015-07-29 Thread Nathan Leung
> is spent waiting because of the Max Spout Topology property. To add to the > mysterry, i set the messasge timeout is at 1 sec. I dont see any failures > (fail() not called) but the spout latency is at 1.5 seconds. > > Regards, > Kashyap > > On Wed, Jul 29, 2015 at 10:35 AM,

Re: Max Spout Pending - Question

2015-07-29 Thread Nathan Leung
> thanks > kashyap > > On Tue, Jul 28, 2015 at 7:30 PM, Nathan Leung wrote: > >> The count is tracked from each spout task and does not include bolt fan >> out. If the setting is 100 and you have 8 spout tasks you can have 800 >> tuples from the spout in your system

Re: Max Spout Pending - Question

2015-07-28 Thread Nathan Leung
The count is tracked from each spout task and does not include bolt fan out. If the setting is 100 and you have 8 spout tasks you can have 800 tuples from the spout in your system. On Jul 28, 2015 6:25 PM, "Kashyap Mhaisekar" wrote: > Hi, > Does Max Spout Topology limitation apply to tuples emitt

Re: How to synchronize worker (prepare method)

2015-07-27 Thread Nathan Leung
Using task id is a good idea. I don't recall if 0 or 1 is the lowest, but it's probably a good to use the lowest possible one for intialization, just in case you ever run a topology with only 1 bolt. On Mon, Jul 27, 2015 at 8:02 AM, Denis DEBARBIEUX wrote: > Dear all, > > I am writting a bolt.

Re: Storm scalability issue

2015-07-26 Thread Nathan Leung
What does the emit look like with the direct grouping? Are you changing the number of tasks you emit to? On Jul 26, 2015 6:39 PM, "Dimitris Sarlis" wrote: > Kashyap, > > I put logger before and after emit in each bolt. In spouts it's not so > easy because I'm using the predefined class KafkaSpo

Re: What happens when a message times out?

2015-07-20 Thread Nathan Leung
The computation (and the tuple) continue on. The fail() method will be called on the spout with the message id. On Mon, Jul 20, 2015 at 1:51 PM, Mark Tomko wrote: > Suppose we have a simple topology with a spout and a single bolt, and the > tuple timeout is set to some value. When a message exc

Re: Realtime computations using storm - questions on performance

2015-07-19 Thread Nathan Leung
> > Parallelism does not seem to be a problem as capacity is under 0.3-0.5 for > all bolts. > > Do you know of any other reasons based on experience? > > Thanks for the time > > Thanks > Kashyap > On Jul 19, 2015 02:29, "Nathan Leung" wrote: > >>

Re: Realtime computations using storm - questions on performance

2015-07-19 Thread Nathan Leung
P PKey -- >> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> >> https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt >> >> On Fri, Jul 17, 2015 at 2:12 AM, Nathan Leung wrote: >> >>> If your tuples are reliable (spout emit with message id) and an

Re: Realtime computations using storm - questions on performance

2015-07-16 Thread Nathan Leung
shyap Mhaisekar" wrote: > Nathan, > My max spout pending is set to 1. Now is my problem with latency or with > throughput. > > Thank you! > Kashyap > On Jul 16, 2015 5:46 PM, "Nathan Leung" wrote: > >> If your tuples are anchored max spout pending i

Re: Realtime computations using storm - questions on performance

2015-07-16 Thread Nathan Leung
y and not really controllable > via Max Spout Pending? > > Thanks > Kashyap > > On Thu, Jul 16, 2015 at 5:07 PM, Nathan Leung wrote: > >> Also I would argue that this is not important unless your application is >> especially latency sensitive or your queue is so long that it

Re: Realtime computations using storm - questions on performance

2015-07-16 Thread Nathan Leung
Also I would argue that this is not important unless your application is especially latency sensitive or your queue is so long that it is causing in flight tuples to timeout. On Jul 16, 2015 6:05 PM, "Nathan Leung" wrote: > Sorry for a brief response.. The number of tuples in flig

Re: Realtime computations using storm - questions on performance

2015-07-16 Thread Nathan Leung
15 at 3:52 PM, Nick R. Katsipoulakis < > nick.kat...@gmail.com> wrote: > >> Thank you all for the valuable info. >> >> Unfortunately, I have to use it for my (research) prototype therefore I >> have to go along with it. >> >> Thank you again, >&g

Re: Realtime computations using storm - questions on performance

2015-07-16 Thread Nathan Leung
l the data flow. Can you please explain to me why you would not >> recommend direct grouping? Is there any particular reason in the >> architecture of Storm? >> >> Thanks, >> Nick >> >> 2015-07-16 16:20 GMT-04:00 Nathan Leung : >> >>> I wo

Re: Realtime computations using storm - questions on performance

2015-07-16 Thread Nathan Leung
lease explain to me why you would not > recommend direct grouping? Is there any particular reason in the > architecture of Storm? > > Thanks, > Nick > > 2015-07-16 16:20 GMT-04:00 Nathan Leung : > >> I would not recommend direct grouping unless you have a good reason for &

Re: Realtime computations using storm - questions on performance

2015-07-16 Thread Nathan Leung
because of > clock drifting). > 2) (to Nathan) Is there a difference in speeds among different groupings? > For instance, is shuffle faster than direct grouping? > > Thanks, > Nick > > 2015-07-15 17:37 GMT-04:00 Nathan Leung : > >> Two things. Your math may be off de

Re: Realtime computations using storm - questions on performance

2015-07-15 Thread Nathan Leung
Two things. Your math may be off depending on parallelism. One emit from A becomes 100 emitted from C, and you are joining all of them. Second, try the default number of ackers (one per worker). All your ack traffic is going to a single task. Also you can try local or shuffle grouping if possible

Re: How fast can bolt access emitted data

2015-07-15 Thread Nathan Leung
I don't think you adequately addressed Seungtack's concern. I would run a simple topology as a test, with something like 2 workers, spout -> empty bolt -> empty bolt -> empty bolt all with parallelism 2, and all on shuffle grouping. If storm is behaving as poorly as you think it is, then you will

Re: Implications of using Storm without Kafka

2015-07-02 Thread Nathan Leung
It sounds like at least once semantics, but I agree that storm may not be the best solution. It depends on whether there is additional data processing after the server. If not, just have the server write directly to hbase. On Jul 2, 2015 7:30 PM, "Javier Gonzalez" wrote: > I'll second this... You

Re: Use case of localOrShuffleGrouping?

2015-07-01 Thread Nathan Leung
If your bolts are evenly spread and there is at least one on each worker using localOrShuffleGrouping can be a huge optimization. Consider this example. You have 10 bolt A tasks, 10 bolt B tasks, and 10 workers. Each worker has 1 of each task. Bolt b subscribes to bolt a. In shuffle grouping,

Re: Worker thread memory

2015-06-25 Thread Nathan Leung
; > Nick > > 2015-06-25 11:22 GMT-04:00 Nathan Leung : > >> I'm not sure but if I had to wager a guess the former is set on the >> supervisor and will be applied to all topologies run on that supervisor, >> whereas the latter is set per topology. >> >> On

Re: Worker thread memory

2015-06-25 Thread Nathan Leung
I will try to debug and see what's going on. Also, what is the > difference between worker.childopts and topology.worker.childopts? > > Thanks, > Nick > > 2015-06-25 11:10 GMT-04:00 Nathan Leung : > >> The nimbus log will tell you which port the worker was started

Re: Worker thread memory

2015-06-25 Thread Nathan Leung
5+ b.s.event [INFO] Event manager interrupted > 2015-06-24T19:08:39.745+ b.s.event [INFO] Event manager interrupted > 2015-06-24T19:08:39.748+ o.a.s.z.ZooKeeper [INFO] Session: > 0x24e26a304b50025 closed > 2015-06-24T19:08:39.748+ o.a.s.z.ClientCnxn [INFO] EventThread shu

Re: Worker thread memory

2015-06-25 Thread Nathan Leung
Any problems in supervisor or nimbus logs? On Thu, Jun 25, 2015 at 10:49 AM, Nick R. Katsipoulakis < nick.kat...@gmail.com> wrote: > I am using m4.xlarge instances, each one with 4 workers per supervisor. > Yes, they are listed. > > Nick > > 2015-06-25 10:47 GMT-04:00 Nat

Re: Worker thread memory

2015-06-25 Thread Nathan Leung
be more precise, I submitted my topology > (with storm jar...) and I just waited for it to start executing, but > nothing. Any ideas of what might have been the reason? > > Thanks, > Nick > > 2015-06-25 10:39 GMT-04:00 Nathan Leung : > >> In general worker options need t

Re: Worker thread memory

2015-06-25 Thread Nathan Leung
In general worker options need to be set in the supervisor config files. On Thu, Jun 25, 2015 at 10:07 AM, Nick R. Katsipoulakis < nick.kat...@gmail.com> wrote: > Hello sy.pan > > Thank you for the link. I will try the suggestions. > > Cheers, > Nick > > 2015-06-24 22:35 GMT-04:00 sy.pan : > >> F

Re: Fwd: Reshare: Uneven distribution with shuffle grouping

2015-06-23 Thread Nathan Leung
Also to clarify, unless you change the sample frequency the counts in the ui are not precise. Note that they are all multiples of 20. On Jun 23, 2015 7:16 AM, "Matthias J. Sax" wrote: > I don't see any in-balance. The value of "Executed" is 440/460 for each > bolt. Thus each bolt processed about

Re: Singleton database connection

2015-06-17 Thread Nathan Leung
You cannot pass a connection through the builder, as you have noticed. You can use a lazy initialized singleton or some sort of connection pool. On Jun 17, 2015 5:05 PM, "Kushan Maskey" < kushan.mas...@mmillerassociates.com> wrote: > I am wondering how can I safely create a Singleton database conn

Re: Has anybody successfully run storm 0.9+ in production under reasonable load?

2015-06-12 Thread Nathan Leung
nodes have each other listed in /etc/hosts. On Jun 12, 2015 8:59 AM, "Nathan Leung" wrote: > Make sure your topology is starting up in the allotted time, and if not > try increasing the startup timeout. > On Jun 12, 2015 2:46 AM, "Fang Chen" wrote: > >> Hi Erik

Re: Has anybody successfully run storm 0.9+ in production under reasonable load?

2015-06-12 Thread Nathan Leung
Make sure your topology is starting up in the allotted time, and if not try increasing the startup timeout. On Jun 12, 2015 2:46 AM, "Fang Chen" wrote: > Hi Erik > > Thanks for your reply! It's great to hear about real production usages. > For our use case, we are really puzzled by the outcome s

Re: The WordCountTopology example

2015-06-10 Thread Nathan Leung
You would need to detect that the file changed in the spout, and continue reading from the last offset. You would need to keep this state in the spout. If you wanted it to be fault tolerant you would need to save this state in a distributed filesystem or a database. On Jun 10, 2015 1:04 AM, "Rakesh

Re: Best Spout implementation for Reading input Data From File

2015-06-07 Thread Nathan Leung
You should emit with a message id, which will prevent too many messages from being in flight simultaneously, which will alleviate your out of memory conditions. On Jun 7, 2015 5:05 AM, "Michail Toutoudakis" wrote: > What is the best spout implementation for reading input data from file? I > have

Re: long running bolts

2015-06-02 Thread Nathan Leung
You can emit the long running tuple unanchored but if you need to guarantee its processing then it should be sent to another topology or processing framework. In fact even if it's in the same topology unanchored I would create separate instances of the bolt for these long running tuples. If you d

Re: long running bolts

2015-06-02 Thread Nathan Leung
I agree with Mike, I would send the long running tuple out of your topology to be processed. The timeout must be long enough to handle your expected worst case, and if most messages are very fast then having a few long running tuples will mask any potential problems you have with your other data.

Re: Newbie Question: Can two different bolts subscribe to each other??

2015-05-29 Thread Nathan Leung
This is possible but if you need to do this on a per tuple basis I would consider doing it in the spout ack method. If you are doing batches I would consider using trident. On May 29, 2015 8:28 AM, "Michail Toutoudakis" wrote: > I would like to ask if it is possible two different bolts to subscri

Re: tuple size limitation?

2015-05-29 Thread Nathan Leung
The default (and in old releases ONLY) multi lang serializer is json, which is in fact slow. On May 29, 2015 8:04 AM, "Andrew Xor" wrote: > ​I think in the storm documentation it clearly says that not only you have > to serialize your objects but when using custom types it is better to > implemen

Re: Throughput : local mode is faster than cluster mode

2015-05-28 Thread Nathan Leung
It's hard to help without any details about what you did. I have not seen local mode outperform cluster mode by this factor, especially if it's all on one server, so more information would help determine what is going on.

Re: Is it supported to set different JVM options for different tasks(workers)?

2015-05-22 Thread Nathan Leung
I would rather write a custom scheduler and put certain topologies on specific hosts. On May 22, 2015 1:09 PM, "Mark Zang" wrote: > In a topology, some task might consumes more memories and some consumes > less. So is it supported to set task level / worker level JVM options to > set worker proce

Re: Re: StackOverFlow

2015-05-20 Thread Nathan Leung
hat referred to recursive reference? > Thank you very much. > > ------ > yang...@bupt.edu.cn > > *From:* Nathan Leung > *Date:* 2015-05-21 10:25 > *To:* yang...@bupt.edu.cn > *CC:* user > *Subject:* Re: Re: StackOverFlow > If the data structure is not immutable then you

Re: Re: StackOverFlow

2015-05-20 Thread Nathan Leung
thank you~ > > ------ > yang...@bupt.edu.cn > > *From:* Nathan Leung > *Date:* 2015-05-20 21:12 > *To:* user > *Subject:* Re: StackOverFlow > It looks like he's in a recursive loop in a java serialization routine > (ObjectOutputStream). This

Re: Decreasing Complete latency with growing number of executors

2015-05-20 Thread Nathan Leung
cess latency" should be > correlated. Am I right? > > > On Wed, May 20, 2015 at 2:10 PM, Nathan Leung wrote: > >> My point with increased throughput was that if you have items queued from >> the spout waiting to be processed, that counts towards the complete late

Re: StackOverFlow

2015-05-20 Thread Nathan Leung
aass > linkedin.com/in/jeffmaass > stackoverflow.com/users/373418/maassql > + > > > On Tue, May 19, 2015 at 10:23 PM, Nathan Leung wrote: > >> Looks like you have a reference loop in your data structure >> On May 19, 2015 11:07 P

Re: Decreasing Complete latency with growing number of executors

2015-05-20 Thread Nathan Leung
18/maassql >>> + >>> >>> >>> On Tue, May 19, 2015 at 9:47 AM, Dima Dragan >>> wrote: >>> >>>> Thanks Nathan for your answer, >>>> >>>> But I`m afraid that you understand me wrong : With in

Re: StackOverFlow

2015-05-19 Thread Nathan Leung
Looks like you have a reference loop in your data structure On May 19, 2015 11:07 PM, "yang...@bupt.edu.cn" wrote: > hi, > > I encountered the following error exception: > > java.lang.StackOverflowError at > java.io.ObjectStreamClass$FieldReflector.getPrimFieldValues(ObjectStreamClass.java:1930)

Re: Decreasing Complete latency with growing number of executors

2015-05-19 Thread Nathan Leung
It depends on your application and the characteristics of the io. You increased executors by 32x and each executor's throughput dropped by 5x, so it makes sense that latency will drop. On May 19, 2015 9:54 AM, "Dima Dragan" wrote: > Hi everyone, > > I have found a strange behavior in topology met

Re: How to implement static initialization in storm for several topolgies

2015-05-15 Thread Nathan Leung
http server, > it's better to close it after all topologies will be deactivated). > > I thinking about creating separate init topology. Is the a normal way to > achieve this? > How can I guarantee that this bolt will be created on each worker node? > > Thank you. > >

Re: How to implement static initialization in storm for several topolgies

2015-05-15 Thread Nathan Leung
Sorry misread original email. If you need to share across topologies then you should consider something like memcache. On May 15, 2015 6:52 PM, wrote: > Lazy initialized singleton. Let the os release the memory when you're done > :). > On May 15, 2015 6:44 PM, "Dzmitry Viarzhbitski" < > dzmitry.vi

Re: How to implement static initialization in storm for several topolgies

2015-05-15 Thread Nathan Leung
Lazy initialized singleton. Let the os release the memory when you're done :). On May 15, 2015 6:44 PM, "Dzmitry Viarzhbitski" < dzmitry.viarzhbit...@gmail.com> wrote: > Let's I assume that I have several topologies in the same jar which are > using some cache. > I need to init cache (cache subscr

Re: Should I always make a copy of the data in Tuple during execute()?

2015-05-15 Thread Nathan Leung
ts between bolts? Many > thanks. I would really appreciate more info on this. > > On Thu, May 14, 2015 at 8:15 PM, Nathan Leung wrote: > >> It sounds like you reuse the same simple pojo object in bolt a. You >> should avoid doing this; even with your copy workaround it'

Re: Should I always make a copy of the data in Tuple during execute()?

2015-05-14 Thread Nathan Leung
It sounds like you reuse the same simple pojo object in bolt a. You should avoid doing this; even with your copy workaround it's possible to run into more subtle race conditions. On May 14, 2015 8:51 PM, "Banias H" wrote: > In prototyping an application, I use a simple pojo (see below) to send >

  1   2   3   >