Re: Scaling Beam pipeline on Data Flow - Join bounded and non-bounded source

2019-03-20 Thread Juan Carlos Garcia
Your auto scaling algorithm is THROUGHPUT_BASED, it will kicks in only when it feels the pipeline is not able to keep it up with the incoming source. How big is your bounded-source and how much pressure (messages per seconds) your unbounded source is receiving? Maulik Gandhi schrieb am Di., 19. M

joda-time dependency version

2019-03-20 Thread rahul patwari
Hi, Is there a plan to upgrade the dependency version of joda-time to 2.9.3 or latest version? Thanks, Rahul

Re: joda-time dependency version

2019-03-20 Thread Ismaël Mejía
Hello, The long term goal would be to get rid of joda-time but that won't happen until Beam 3. Any 'particular' reason or motivation to push the upgrade? Regards, Ismaël On Wed, Mar 20, 2019 at 11:53 AM rahul patwari wrote: > > Hi, > > Is there a plan to upgrade the dependency version of joda-t

Re: Scaling Beam pipeline on Data Flow - Join bounded and non-bounded source

2019-03-20 Thread Maulik Gandhi
I think now as I understand this more clearly, there are a couple of things going on. I will try to re-explain what I am trying to achieve. - I am reading from 2 Sources - Bounded (AVRO from GCS) - Unbounded (AVRO from PubSub) - I want to prime Beam pipeline state, with data from GCS (bounded

Re: Scaling Beam pipeline on Data Flow - Join bounded and non-bounded source

2019-03-20 Thread Maulik Gandhi
How big is your bounded-source - 16.01 GiB total data from AVRO files. But it can be b/w 10-100s of GBs How much pressure (messages per seconds) your unbounded source is receiving? - Initially no pressure, to prime the Beam state, but later there will be data flowing through PubSub. I also add

Re: Scaling Beam pipeline on Data Flow - Join bounded and non-bounded source

2019-03-20 Thread Juan Carlos Garcia
I would recommend going to the compute engine service and check the vm where the pipeline is working, from there you might have more insight if you have a bottleneck on your pipeline (cpu, io, network) that is preventing to process it faster. Maulik Gandhi schrieb am Mi., 20. März 2019, 20:15:

Re: Scaling Beam pipeline on Data Flow - Join bounded and non-bounded source

2019-03-20 Thread Reza Rokni
Hi, How many keys do you have flowing through that global window? If there is a wide key space any chance you have a few very hot keys? Cheers Rez On Thu, 21 Mar 2019, 04:04 Juan Carlos Garcia, wrote: > I would recommend going to the compute engine service and check the vm > where the pipelin

Re: joda-time dependency version

2019-03-20 Thread rahul patwari
Hi Ismael, We are using Beam with Spark Runner and Spark 2.4 has joda-time 2.9.3 as a dependency. So, we have used joda-time 2.9.3 in our shaded artifact set. As Beam has joda-time 2.4 as a dependency, I was wondering whether it would break anything in Beam. Will joda-time be replaced with java t