Re: Storm Message Flow Question

2015-06-07 Thread Seungtack Baek
It surely did! Thanks for such a precint answer! Thanks, Baek > On Jun 8, 2015, at 12:43 AM, Vineet Mishra wrote: > > Any Storm Streaming job runs in its own space and doesn't interact with other > topology. Your tuple distribution will be across the topology within the > number of workers on

Re: Storm Message Flow Question

2015-06-07 Thread Dima Dragan
For your case, if messages have the same field value, they will be send to only one executor in whole topology. Best regards, Dmytro Dragan On Jun 8, 2015 08:31, "Seungtack Baek" wrote: > Thanks a lot for such a timely response. > > So, even if each bolt tasks resides in different worker (differ

Re: Storm Message Flow Question

2015-06-07 Thread Vineet Mishra
Any Storm Streaming job runs in its own space and doesn't interact with other topology. Your tuple distribution will be across the topology within the number of workers on the number of bolts defined, so for instance if you have shuffle grouping enabled and specific data of your interest 0 1 - K

Re: Storm Message Flow Question

2015-06-07 Thread Vineet Mishra
For having the unique tuple access across the Bolts use shuffle group (otherwise for some specific use case refer to my last mail links), it will distribute the data uniformly across all the bolts without heavily loading any of the bolt, it basically works on the hashing principle, assign the tuple

Re: Storm Message Flow Question

2015-06-07 Thread Seungtack Baek
@Vineet, Thanks a lot for "another" timely response! Actually I have read that section but it wasn't still clear (to me, and I guess to me only) whether field grouping was concerning the whole cluster (or topology) or for the same worker only.. Maybe I am not too familiar with the "zoo". Thanks

Re: Storm Message Flow Question

2015-06-07 Thread Vineet Mishra
Hi Seung, You can better refer to the section Stream Groupings in the following link attached below https://storm.apache.org/documentation/Concepts.html It will get you better understanding of the tuple distribution in Storm, for clear understanding here is the pictorial representation of the sa

Re: Storm Message Flow Question

2015-06-07 Thread Seungtack Baek
Thanks a lot for such a timely response. So, even if each bolt tasks resides in different worker (different server in our use-case), the messages go to all 32 tasks, right? Also, this leads me into another question. (I think the answer is yes). Given field grouping guarantees that messages with s

Re: Storm Message Flow Question

2015-06-07 Thread Dima Dragan
Hi, Seungtack! Distribution of messages will be depends only from grouping (in case of "shuffe grouping", Tuples are randomly distributed across the all bolt's tasks in a way such that each bolt is guaranteed to get an equal number of tuples. Best regards, Dmytro Dragan On Jun 8, 2015 07:12, "Seu

Re: Storm Message Flow Question

2015-06-07 Thread Seungtack Baek
Hi, I have read from the documentation that if you have more spout tasks than kafka partition, the excessive tasks will remain idle for entire lifecycle of the topology. Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4 workers (in 4 nodes) and 2 partitions in kafka. Then 2 tas

Re: Best Spout implementation for Reading input Data From File

2015-06-07 Thread Nathan Leung
You should emit with a message id, which will prevent too many messages from being in flight simultaneously, which will alleviate your out of memory conditions. On Jun 7, 2015 5:05 AM, "Michail Toutoudakis" wrote: > What is the best spout implementation for reading input data from file? I > have

Re: What is the best way of implementing a file reader spout in local mode

2015-06-07 Thread Enno Shioji
Somewhere in your code you are starting way too many threads (more than thousands). I don't see that in your code you posted, so it must be in one of the classes you haven't posted. Are you using multithreading anywhere? Are you instantiating services that spawn threads (like network clients)? If

What is the best way of implementing a file reader spout in local mode

2015-06-07 Thread Michail Toutoudakis
I am trying to read some data from text file and process them. I am currently using scanner. In the beginning everything works fine for the first 1 values and then it looks like no other input lines are sent to the bolt that implements the algorithm. Finally after a few minutes of run i get

Best Spout implementation for Reading input Data From File

2015-06-07 Thread Michail Toutoudakis
What is the best spout implementation for reading input data from file? I have implemented a spout for reading input data from file using a scanner which seems to perform better than buffered file reader. However i still loose some values, not many this time about 1%, but the problem is that af