How Acker are scheduled when using Pluggable Scheduler

2014-05-12 Thread Pratik Mehta
I am trying to use Custom Scheduler using the tutorial created by James Xu. I have one question regarding the acker bolt, If I want 1 acker bolt per worker do I need to specifically add acker bolt to the list of executors that needs to be assigned to a slot?

Storm Scaling Issues

2014-05-12 Thread Lasantha Fernando
Hi all, Is there any guide or hints on how to configure storm to scale better? I was running some tests with a custom scheduler and found that the throughput did not scale as expected. Any pointers on what I am doing wrong? Parallelism24816Single Node (Avg)166099161539.5193986N/ATwo Node (Avg)16

Re: how to debug tuple failures

2014-05-12 Thread Srinath C
I had encountered such an issueearlier on version 0.9.0.1 but did not get much help to resolve it. Try to tune the topology.max.spout.pending value

Re: Interesting Comparison

2014-05-12 Thread Marc Vaillant
To play devil's advocate, if you believe the stream performance gains, then the 40k will likely pay for itself in needing to deploy a fraction of the resources for the same throughput. On Mon, May 12, 2014 at 09:02:53AM -0400, John Welcher wrote: > Hi > > Streams also cost 40,000 US while Storm

how to debug tuple failures

2014-05-12 Thread Anshul Mittal
Hi, I am running a storm topology with the following configuration: *Storm : 0.8.2* *Kafka : 0.7.2* *Storm Kafka Spout : 0.8.0-wip4* I am observing that after some time, the topology enters a weird state and starts retrying a bunch of tuples. I want to see what exact failure causes the topology to

Re: Peeking into storm's internal buffers

2014-05-12 Thread Srinath C
Here is more info... My suspicion is that the queue backtype.storm.messaging.netty.Server#message_queue is not getting consumed. A heap dump of the worker process reveals that the size of that queue is around 40k. I have snippets of the worker log that I can share right now. Some thread stacks are

Re: Interesting Comparison

2014-05-12 Thread Jon Logan
The claims are certainly interesting...I haven't looked through it super detailed, but I would definitely keep in mind who is making the claims. Looking at it briefly, it looks like something is really wrong, looking at their scaling graphs. Without further information, I think it's hard to properl

Re: Interesting Comparison

2014-05-12 Thread Corey Nolet
Interesting that the paper was written by IBM people defending an IBM product. Not saying that it's biased or anything... Nathan, I agree that the windowing is better served as a layer on top. Personally, I appreciate that Storm deals with clustering, distributed state, fault-tolerance, and thread

Re: Interesting Comparison

2014-05-12 Thread Klausen Schaefersinho
Hi, my guess is that 40k are per CPU or so... for sure not for an entire cluster. On Mon, May 12, 2014 at 4:46 PM, Marc Vaillant wrote: > To play devil's advocate, if you believe the stream performance gains, > then the 40k will likely pay for itself in needing to deploy a fraction > of the res

Re: why does storm not supply a mechanism for supplying topology necessary dependent jars other than the fat jar?

2014-05-12 Thread Cody A. Ray
I'm not positive, but I don't think storm isolates topologies (and its core stuff) with different classpath loaders. This is why you have to make a fat jar and shade all dependencies into your own namespace. You can only have one version of a jar per classpath loader. Or something like that. :) -C

Re: Interesting Comparison

2014-05-12 Thread John Welcher
Hi Streams also cost 40,000 US while Storm is free. John On Mon, May 12, 2014 at 3:49 AM, Klausen Schaefersinho < klaus.schaef...@gmail.com> wrote: > Hi, > > I found some interesting comparison of IBM Stream and Storm: > > https://www.ibmdw.net/streamsdev/2014/04/22/streams-apache-storm/ > > I

Re: Interesting Comparison

2014-05-12 Thread Ted Dunning
Anybody who has ever only paid 40K$ to IBM for anything should deserve a prize. That is just the entry fee. On Mon, May 12, 2014 at 7:46 AM, Marc Vaillant wrote: > To play devil's advocate, if you believe the stream performance gains, > then the 40k will likely pay for itself in needing to de

Re: If supervisor fail and will not be restarted

2014-05-12 Thread Srinath C
The current running workers will not get affected but if any of the worker processes crash, they will not get relaunched - basically the topology will remain disfunctional until the supervisor is started again and it re-spawns the worker process. On Mon, May 12, 2014 at 1:40 PM, Chengwei Yang wro

unable to install/test incubator-storm/examples missing dependencies

2014-05-12 Thread Thomas Puthiaparambil
I get the following error [root@localhost storm-starter]# mvn compile exec:java -Dstorm.topology=storm.starter.WordCountTopology [INFO] Scanning for projects... [INFO] [INFO] Using the builder org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder with a thread count of 1

Re: If supervisor fail and will not be restarted

2014-05-12 Thread Chengwei Yang
On Mon, May 12, 2014 at 05:25:32PM +0530, Srinath C wrote: > The current running workers will not get affected but if any of the worker > processes crash, they will not get relaunched - basically the topology will > remain disfunctional until the supervisor is started again and it re-spawns > the

Re: Can i increase the parallelism of the topology as per traffic

2014-05-12 Thread Nathan Leung
Not to be trite, but the help message for the command will tell you everything you need: $ storm help rebalance Syntax: [storm rebalance topology-name [-w wait-time-secs] [-n new-num-workers] [-e component=parallelism]*] Sometimes you may wish to spread out where the workers for a topology

Unit tests timing out on 0.9.0.1

2014-05-12 Thread M Mansur Ashraf
Hi I am trying to run some test that works fine on 0.9.0-wip15 but timing out on 0.9.0.1. Test I am running can be found here https://github.com/twitter/tormenta/blob/develop/tormenta-core/src/test/scala/com/twitter/tormenta/TopologyTest.scala Below is the exception thats being thrown with 0.9.0.1

Peeking into storm's internal buffers

2014-05-12 Thread Srinath C
Hi, I'm facing a strange issue running a topology on version 0.9.1-incubating with Netty as transport. The topology has two worker processes on the same worker machine. To summarize the behavior, on one of the worker processes: - one of the bolts are not getting executed: The bolt ha

Re: storm trident question

2014-05-12 Thread Ted Dunning
Spark streaming is a very different animal than Storm in that it does micro-batching rather than true streaming. This has positives and negatives. Average latency on record by record processing will appear to be abysmal compared to Storm. Throughput could well be much higher because of the inher

Re: Interesting Comparison

2014-05-12 Thread Nathan Leung
a couple thoughts 1) IBM streams is certainly more mature, as it's been in development for a longer amount of time and storm is not even at release 1.0 yet. Though I am not familiar with SPL, It would also make sense that it's faster to implement as it is a higher level abstraction. 2) Operator

Lost connection to zookeeper and lead to supervisor restart

2014-05-12 Thread Ryan Chan
This morning, the supervisors/nimbus connections to zookeeper are having problems (they are inside AWS VPC same subnet, not sure the root issue), from the supervisor log: http://pastebin.com/CVMTrfuQ The supervisord died at line: 2014-05-11 00:11:07 b.s.util [INFO] Halting process: ("Error when

Re: Peeking into storm's internal buffers

2014-05-12 Thread padma priya chitturi
Hi, Few questions on your issue: 1. As soon as you start the topology, is that the bolt execution is not started forever ? or is it like after processing few tuples, bolt execution has stuck. Can you give clear picture on this. 2. You said that the behavior is seen on one of the worker process. S

Re: storm trident question

2014-05-12 Thread Cody A. Ray
I don't know of any head-to-head comparison, but I've never looked for one either. I'm not very familiar with spark in general, but maybe someone else on this list is? :) -Cody On May 11, 2014 2:46 PM, "Weide Zhang" wrote: > Hi Cody, > > Thanks for your reply. Do you know if there is any perform

Trident stream ordering

2014-05-12 Thread Kiran Kumar
Is it possible to get a stream ordered, with trident api alone, on a specific field..?

Re: How to consume data from multiple Kafka topics?

2014-05-12 Thread Joe Stein
In Kafka you can use the WhiteList and BlackList topic filters (to regex topics that match or do not match respectively). The console consumer is a nice example how to-do this with the high level consumer https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/consumer/ConsoleConsumer

Help Required: Exception doing LeftOuterJoining on Multiple Streams

2014-05-12 Thread Kiran Kumar
Below is the test code i am trying to do left outer join on multiple streams.. The issue i am getting is something like.. RuntimeException: Expecting 4 lists instead getting 3 lists. FYI: This works fine for InnerJoin, but failing with the above exception when i am trying for Left Outer Join. =

Help Required: Exception doing LeftOuterJoining on Multiple Streams

2014-05-12 Thread Kiran Kumar
Below is the test code i am trying to do left outer join on multiple streams.. The issue i am getting is something like.. RuntimeException: Expecting 4 lists instead getting 3 lists. FYI: This works fine for InnerJoin, but failing with the above exception when i am trying for Left Outer Join. ==

Re: why does storm not supply a mechanism for supplying topology necessary dependent jars other than the fat jar?

2014-05-12 Thread Xing Yong
+1 2014-05-11 15:35 GMT+08:00 Yaneeve Shekel : > Hi All, > > > > I am no python expert and am also a newbie to storm. > > > > I have gone over the > https://github.com/apache/incubator-storm/blob/master/bin/storm file in > order to see how to add jars to the classpath. Obviously, the preferred

Emitter.emitPartitionBatchNew not being called after Coordinator.isReady returns true

2014-05-12 Thread Simon Cooper
I've got a very very strange problem with one of my topologies. We've tried deploying to a clustered environment, and the trident topology we've got isn't running the Emitter when the Coordinator returns true from isReady(). At all. The logging message right at the start of the method is not bei

Resolution Required: with LeftOuterJoining Multiple Streams

2014-05-12 Thread Kiran Kumar
Please find the attachments for the test code i am trying to do left outer join on multiple streams.. The issue i am getting is something like.. RuntimeException: Expecting 4 lists instead getting 3 lists. FYI: This works fine for InnerJoin, but failing with the above exception when i am trying f

If supervisor fail and will not be restarted

2014-05-12 Thread Chengwei Yang
Hi List, I see the storm fault-tolerance page said that the storm supervisor is stateless and so as it can be just restarted after fail. I'm wondering if the fail supervisor will not be restarted, what will happening? Is the only side-effect is that the nimbus can not ask the *failed* supervisor

How to consume data from multiple Kafka topics?

2014-05-12 Thread Amikam Snir
Hi all, Do you know code (Spout-code), which consume from multiple topics for example using wildcard against the topic name? The topics at my application are created dynamically. Should I just use dedicated Kafka-Spout for each topic and somehow updated the topology at run-time? Thanks in

Re: How to consume data from multiple Kafka topics?

2014-05-12 Thread David Miller
The kafka "high level" consumer supports topic filtering for doing wildcards against topics. The storm-kafka spout uses the simple consumer which doesnt have these features (but has others needed for reliable messaging, offset management) you could write your own high level consumer spout which co

Interesting Comparison

2014-05-12 Thread Klausen Schaefersinho
Hi, I found some interesting comparison of IBM Stream and Storm: https://www.ibmdw.net/streamsdev/2014/04/22/streams-apache-storm/ It also includes an interesting comparison between ZeroMQ and the Netty Performance. Cheers, Klaus

How to achieve multi-tenancy by kafka & storm?

2014-05-12 Thread Amikam Snir
Hi all, What are the best practices, for building a multi-tenant app in the context of Kafka and storm? For example: creating topic for each tenant and consume multi-topics spout (using wildcard). Thanks in advance, Amikam.