Re: Correlation between data streams/operators and threads

2017-11-22 Thread Nico Kruber
Hi Shailesh, your JobManager log suggests that this same JVM instance actually contains a TaskManager as well (sorry for not noticing earlier). Also this time, there is nothing regarding the BlobServer/BlobCache, but it looks like the task manager may think the jobmanager is down. Can you try wi

Re: Correlation between data streams/operators and threads

2017-11-21 Thread Shailesh Jain
Understood. Thanks a lot! I'll try out the keyBy approach first. Shailesh On Tue, Nov 21, 2017 at 1:53 PM, Piotr Nowojski wrote: > So as long as the parallelism of my kafka source and sink operators is 1, > all the subsequent operators (multiple filters to create multiple streams, > and then

Re: Correlation between data streams/operators and threads

2017-11-21 Thread Piotr Nowojski
> So as long as the parallelism of my kafka source and sink operators is 1, all > the subsequent operators (multiple filters to create multiple streams, and > then individual CEP and Process operators per stream) will be executed in the > same task slot? Yes, unless you specify different resou

Re: Correlation between data streams/operators and threads

2017-11-20 Thread Shailesh Jain
Thanks for your time in helping me here. So as long as the parallelism of my kafka source and sink operators is 1, all the subsequent operators (multiple filters to create multiple streams, and then individual CEP and Process operators per stream) will be executed in the same task slot? I cannot

Re: Correlation between data streams/operators and threads

2017-11-17 Thread Piotr Nowojski
Sorry for not responding but I was away. Regarding 1. One source operator, followed by multiple tasks with parallelism 1 (as visible on your screen shot) that share resource group will collapse to one task slot - only one TaskManager will execute all of your job. Because all of your events ar

Re: Correlation between data streams/operators and threads

2017-11-17 Thread Nico Kruber
regarding 3. a) The taskmanager logs are missing, are there any? b) Also, the JobManager logs say you have 4 slots available in total - is this enough for your 5 devices scenario? c) The JobManager log, however, does not really reveal what it is currently doing, can you set the log level to DEBUG

Re: Correlation between data streams/operators and threads

2017-11-16 Thread Shailesh Jain
Bump. On Wed, Nov 15, 2017 at 12:34 AM, Shailesh Jain wrote: > 1. Single data source because I have one kafka topic where all events get > published. But I am creating multiple data streams by applying a series of > filter operations on the single input stream, to generate device specific > data

Re: Correlation between data streams/operators and threads

2017-11-14 Thread Shailesh Jain
1. Single data source because I have one kafka topic where all events get published. But I am creating multiple data streams by applying a series of filter operations on the single input stream, to generate device specific data stream, and then assigning the watermarks on that stream. Will this not

Re: Correlation between data streams/operators and threads

2017-11-14 Thread Piotr Nowojski
1. It seems like you have one single data source, not one per device. That might make a difference. Single data source followed by comap might create one single operator chain. If you want to go this way, please use my suggested solution c), since you will have troubles with handling watermarks

Re: Correlation between data streams/operators and threads

2017-11-14 Thread Shailesh Jain
1. Okay, I understand. My code is similar to what you demonstrated. I have attached a snap of my job plan visualization. 3. Have attached the logs and exception raised (15min - configured akka timeout) after submitting the job. Thanks, Shailesh On Tue, Nov 14, 2017 at 2:46 PM, Piotr Nowojski w

Re: Correlation between data streams/operators and threads

2017-11-14 Thread Piotr Nowojski
Hi, 1. I’m not sure what is your code. However I have tested it and here is the example with multiple streams in one job: https://gist.github.com/pnowojski/63fb1c56f2938091769d8de6f513567f As expected it created 5 source threa

Re: Correlation between data streams/operators and threads

2017-11-13 Thread Shailesh Jain
Hi Piotrek, I tried out option 'a' mentioned above, but instead of separate jobs, I'm creating separate streams per device. Following is the test deployment configuration as a local cluster (8GB ram, 2.5 GHz i5, ubuntu machine): akka.client.timeout 15 min jobmanager.heap.mb 1024 jobmanager.rpc.ad

Re: Correlation between data streams/operators and threads

2017-11-13 Thread Piotr Nowojski
Sure, let us know if you have other questions or encounter some issues. Thanks, Piotrek > On 13 Nov 2017, at 14:49, Shailesh Jain wrote: > > Thanks, Piotr. I'll try it out and will get back in case of any further > questions. > > Shailesh > > On Fri, Nov 10, 2017 at 5:52 PM, Piotr Nowojski

Re: Correlation between data streams/operators and threads

2017-11-13 Thread Shailesh Jain
Thanks, Piotr. I'll try it out and will get back in case of any further questions. Shailesh On Fri, Nov 10, 2017 at 5:52 PM, Piotr Nowojski wrote: > 1. It’s a little bit more complicated then that. Each operator chain/task > will be executed in separate thread (parallelism > Multiplies that).

Re: Correlation between data streams/operators and threads

2017-11-10 Thread Piotr Nowojski
1. It’s a little bit more complicated then that. Each operator chain/task will be executed in separate thread (parallelism Multiplies that). You can check in web ui how was your job split into tasks. 3. Yes that’s true, this is an issue. To preserve the individual watermarks/latencies (assumin

Re: Correlation between data streams/operators and threads

2017-11-09 Thread Shailesh Jain
On 1. - is it tied specifically to the number of source operators or to the number of Datastream objects created. I mean does the answer change if I read all the data from a single Kafka topic, get a Datastream of all events, and the apply N filters to create N individual streams? On 3. - the prob

Re: Correlation between data streams/operators and threads

2017-11-09 Thread Piotr Nowojski
Hi, 1. https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/parallel.html Number of threads executing would be roughly speaking equal to of the number of input data streams multiplied by the parallelism.

Correlation between data streams/operators and threads

2017-11-09 Thread Shailesh Jain
Hi, I'm trying to understand the runtime aspect of Flink when dealing with multiple data streams and multiple operators per data stream. Use case: N data streams in a single flink job (each data stream representing 1 device - with different time latencies), and each of these data streams gets spl