Dataflow HA and DR

2019-04-15 Thread asharma . gd
hi We have a few simple Dataflow Streaming jobs running. Requirement is to build HA/DR solution. a) Is it a good idea to spin multiple Dataflow jobs in different regions listing to same 'shared' pubsub Subscription. b) If not , then can you please share some best practices about it. Thanks

watermark

2018-09-14 Thread asharma . gd
In my dataflow job , i read a message from pubsub. In Job GUI -- I see Data watermark 's value as => Max watermark a) what is meaning of 'Max watermark' . b) Does it impact in anyway how groupBy is evaluated. c) is there a way to explicitly set value of watermark to 'Max Watermark' Thanks Ani

cores and partitions in DataFlow

2018-09-13 Thread asharma . gd
Like Spark has 2 levels of processing a) across different worker. b) Within same Executor - multiple cores can work on different partitions. I know in Apache Beam with DataFlow as Runner - partitioning is abstracted. But does Dataflow uses multiple cores to process different partitions at same ti

Re: windowing -> groupby

2018-09-13 Thread asharma . gd
On 2018/09/13 22:30:07, Lukasz Cwik wrote: > You can even change windowing strategies between group bys with Window.into. > > On Thu, Sep 13, 2018 at 3:29 PM Lukasz Cwik wrote: > > > Multiple group by are supported. > > > > On Thu, Sep 13, 2018 at 2:36 PM asharma...@gmail.com > > wrote: >

windowing -> groupby

2018-09-13 Thread asharma . gd
Hi from documentation groupby is applied on key and window basis. If my source is Pubsub (unbounded) - does Beam support applying multiple groupby transformations and all of applied groupby transformation execute in a single window. Or is only one groupby operation supported for unbounded sour

Re: Out of Memory errors

2018-09-06 Thread asharma . gd
On 2018/09/06 05:31:45, Jean-Baptiste Onofré wrote: > Hi, > > AFAIU you are using dataflow runner right ? > > Regards > JB > > On 06/09/2018 00:03, asharma...@gmail.com wrote: > > Hi > > > > I am doing some processing and creating local list inside a Pardo function. > > Total data size is

Out of Memory errors

2018-09-05 Thread asharma . gd
Hi I am doing some processing and creating local list inside a Pardo function. Total data size is 2 GB which is executing inside a single instance. I am running this code on highmem-64 machine and It gives this error Shutting down JVM after 8 consecutive periods of measured GC thrashing. Memory

Streaming mode Dataflow - how to make Autoscaling kick in for FileIO.match operations

2018-08-29 Thread asharma . gd
Excerpt for Autoscaling on Streaming mode "Currently, PubsubIO is the only source that supports autoscaling on streaming pipelines. All SDK-provided sinks are supported. In this Beta release, Autoscaling works smoothest when reading from Cloud Pub/Sub subscriptions tied to topics published with

Re: staging location errors while kicking a dataflow job

2018-08-29 Thread asharma . gd
On 2018/08/29 00:37:30, Lukasz Cwik wrote: > It seems like you specified gs:/ and not gs:// > > Typo? > > On Mon, Aug 27, 2018 at 2:02 PM Sameer Abhyankar > wrote: > > > See this thread to see if it is related to the way the executable jar is > > being created: > > > > https://lists.apache

staging location errors while kicking a dataflow job

2018-08-27 Thread asharma . gd
hi i am creating a Dataflow job from a configuration file and I have hard coded the gs staging location it. and I compile an executable jar for my pipeline. I copy the executable jar to a cloud shell environment and execute the jar. But my hard coded part of staging location is not picked and

Re: invoking Beam pipeline from Cloud Function

2018-08-24 Thread asharma . gd
On 2018/08/24 13:58:22, asharma...@gmail.com wrote: > Hi > > Is there any example available about how to invoke Beam pipeline (in java) > from Cloud function. > > Thanks > Aniruddh > To add more details, request is not to invoke the DataFlow template but jar itself.

invoking Beam pipeline from Cloud Function

2018-08-24 Thread asharma . gd
Hi Is there any example available about how to invoke Beam pipeline (in java) from Cloud function. Thanks Aniruddh

Re: preserve order of records while writing in a file

2018-08-21 Thread asharma . gd
On 2018/08/21 16:20:13, Lukasz Cwik wrote: > I would agree with Eugene. A simple application that does this is probably > what your looking for. > > There are ways to make this work with parallel processing systems but its > quite a hassle and only worthwhile if your computation is very expen

preserve order of records while writing in a file

2018-08-21 Thread asharma . gd
Hi I have to process a big file and call several Pardo's to do some transformations. Records in file dont have any unique key. Lets say file 'testfile' has 1 million records. After processing , I want to generate only one output file same as my input 'testfile' and also i have a requirement

File IO - Customer Encrypted keys

2018-07-31 Thread asharma . gd
what is the best suggested way to to read a file encrypted with a customer key which is also wrapped in KMS