Re: lookup values in aggregated state in the same stream

2014-03-18 Thread Danijel Schiavuzzi
In Trident, you can use stateQuery() to query a TridentState. See the Word Count demo topology for an example of usage. Best regards, Danijel Schiavuzzi www.schiavuzzi.com On Thu, Mar 13, 2014 at 6:22 PM, Dirk Weissenborn < dirk.weissenb...@gmail.com> wrote: > Hey guys, > > wanted to ask if i

Storm/kafka/cassandra

2014-03-18 Thread Bilal Al Fartakh
Hi ! I'm a new user of storm , I merely know how storm and kafka work . I saw many tutorials but nothing seems like the one I want . I want to build a topology that could handle this case : - csv files and other formats are received continuously (like one each 10s or less ) . I want to extract th

Fwd: Storm Word Count Example Written in Python Error While Connecting Mysql DATABASE ?

2014-03-18 Thread yogesh panchal
Hi, I am new in storm, I am using https://github.com/AirSage/Petrel library to write storm topology in python, i have successfully run wordcount example written in python, but i now i want to insert output of "WordCountBolt" to mysql database. Following example i am trying but i am getting weird e

Re: Storm Word Count Example Written in Python Error While Connecting Mysql DATABASE ?

2014-03-18 Thread Milinda Pathirage
Hi Yogesh, Its better if you can share the error you got. It will be easy for someone to help you by looking at the error log. Thanks Milinda On Tue, Mar 18, 2014 at 11:50 AM, yogesh panchal wrote: > Hi, > > I am new in storm, I am using https://github.com/AirSage/Petrel library to > write stor

Server load - Topology optimization

2014-03-18 Thread David Crossland
Being very new to storm I'm not sure what to expect in some regards. Ive been playing about with the number of workers/executors/tasks trying to improve throughput on my cluster. I have a 3 nodes, two 4 core and a 2 core node (I can't increase the 3rd node to a medium until the customer gets mo

Re: Server load - Topology optimization

2014-03-18 Thread David Crossland
Could my issue relate to memory allocated to the JVM? Most of the setting are pretty much the defaults. Are there any other settings that could be throttling the topology? I'd like to be able to identify the issue without all this constant "stabbing in the dark"... ? D From: David Crossland<

Re: Server load - Topology optimization

2014-03-18 Thread Nathan Leung
In my experience storm is able to make good use of CPU resources, if the application is written appropriately. You shouldn't require too much executor parallelism if your application is CPU intensive. If your bolts are doing things like remote DB/NoSQL accesses, then that changes things and paral

Conflicting ASM dependencies

2014-03-18 Thread Adrian Petrescu
Hey guys, I've got a conflict between one of my transitive dependencies, and Storm 0.9.1. Basically, a Cassandra JPA library I'm using (Kundera) depends on cglib 2.2.2 which requires asm 3.3.1. Unfortunately Storm has org.ow2.asm:asm:4.0 on its classpath, which causes various runtime errors. It s

nimbus.thrift.max_buffer_size

2014-03-18 Thread Adam Lewis
Upon upgrading from 0.9.0.1 to 0.9.1-incubating, I'm exceeding the new thrift max buffer size (nicely logged on the server side, although the client just gets a broken pipe stack trace form thrift) with an approx 6 MB message(!). Increasing the configured limit solves the problem, but I would have

Re: Server load - Topology optimization

2014-03-18 Thread David Crossland
A bit more information then There are 4 components Spout - This is reading from an azure service bus topic/subscription. A connection is created in the open() method of the spout, nextTuple does a peek on the message, and invokes the following code; StringWriter writer = new S

Reliability and Coordination

2014-03-18 Thread Angelo Genovese
Hello all, I'm working on a topology which takes a url from a kestrel queue, performs a series of varied steps each of which produces a set of results. I'm now attempting to select the "best" result from them by making the topology coordinated. I have a bolt which receives all the results for a g

Re: nimbus.thrift.max_buffer_size

2014-03-18 Thread P. Taylor Goetz
It uploads the file in small (1024*5 bytes) chunks. Does this happen every time (i.e. reproducible)? What is the size of your topology jar? Can you post the server side message (I want to see the length it output). - Taylor On Mar 18, 2014, at 3:40 PM, Adam Lewis wrote: > Upon upgrading from

Re: nimbus.thrift.max_buffer_size

2014-03-18 Thread Adam Lewis
It isn't the jar file, but something about the topology itself; I have a submitter program that submits four topologies all from the same jar. Upon submitting the first topology, the jar is uploaded and topology starts, then the submitter submits two more topologies whilst "reusing" the uploaded j

User Interface

2014-03-18 Thread Klausen Schaefersinho
Hi, I have been prototyping an alternative UI for monitoring a storm cluster. The main driver was, that I would like to have a more condensed view of what is going on in my cluster and that I would also like to monitor some use case specific metrics, e.g. the training accuracy of a classifier or s

Re: Server load - Topology optimization

2014-03-18 Thread Nathan Leung
It could be bolt 3. What is the latency like between your worker and your redis server? Increasing the number of threads for bolt 3 will likely increase your throughput. Bolt 1 and 2 are probably CPU bound, but bolt 3 is probably restricted by your network access. Also I've found that localOrSh

Re: Server load - Topology optimization

2014-03-18 Thread Link Wang
Do you filter out message in the nextTuple method and with no tuple sent at one time it was called? There's a internal mechanism that storm will sleep 1ms to call nextTuple if this calling of nextTuple sent nothing. 2014年3月19日 上午7:14于 David Crossland 写道: > > Perhaps these screenshots might shed

Re: Server load - Topology optimization

2014-03-18 Thread Michael Rose
Well, I see you have 30 spouts instances and 3 bolt instances. Doesn't seem like it's a huge bottleneck, but it is having to send over the wire much of the time. Something to keep in mind for the future. I'd be most suspicious of your spouts. All spouts on a single worker are run single-threaded (

Re: Server load - Topology optimization

2014-03-18 Thread Sean Zhong
The Spout is suspicous. >From the screenshots, you are using no-acked spout, there may also be GC issue there. Here is the suggestion: Check whether you are making connection in the context of nextTuple. If that is true, it means a large latency. You can check the spout latency by enabling acked

Re: nimbus.thrift.max_buffer_size

2014-03-18 Thread P. Taylor Goetz
Cool. I'm going back to the public list to share the knowledge. If your trident topology is compiling down to 35 bolts, then it sound like you have a lot of partitioning operations. Is there any way you can reduce that? That's going to introduce a decent amount of network transfer. And I would

Tuple message id uniqueness

2014-03-18 Thread Srinath C
Hi, I was unable to figure out if the messageId of a tuple emitted from a spout should be globally unique? Or does storm identify a tuple with a combination of spout name, spout task Id and messageId? Thanks, Srinath.

Distribution of tasks across the storm cluster

2014-03-18 Thread Srinath C
Hi, Can anyone point me to some notes on how storm decides to distribute the tasks among its workers. The behavior am seeing is that all tasks of a particular type are being grouped into one worker process. To add more details to my use-case, I have a spout that is sourcing tuples from a rabb

Re: Tuple message id uniqueness

2014-03-18 Thread Nathan Marz
No, it doesn't have to be. You're in full control of it. Internally Storm generates its own tuple ID and maintains a map from that globally unique tuple id to your spout id. The spout id is simply used in the ack/fail methods of the spout (so that you know what was acked/failed) On Tue, Mar 18, 2

Re: Storm Word Count Example Written in Python Error While Connecting Mysql DATABASE ?

2014-03-18 Thread yogesh panchal
Hi milinda, i am getting following error while running topology, 84329 [Thread-21-split] INFO backtype.storm.task.ShellBolt - Launched subprocess with pid 7356 84331 [Thread-21-split] INFO backtype.storm.daemon.executor - Prepared bolt split:(3) 84335 [Thread-23-spout] INFO backtype.storm.spou

Re: Storm Word Count Example Written in Python Error While Connecting Mysql DATABASE ?

2014-03-18 Thread Sean Zhong
Seems to be a python import error On Wed, Mar 19, 2014 at 1:32 PM, yogesh panchal wrote: > ImportError

Tuple ACK issue

2014-03-18 Thread Srinath C
Hi, I'm facing an issue with acknowledgement of tuples. My topology has a spout (BaseRichSpout) that picks a message from RabbitMQ and emits it with the deliveryTag as the message Id. The tuple is then received by a bolt (BaseRichBolt) which processes the tuple. There are other spouts and bolts