Re: Push Kafka data to S3 based on window size in bytes

2019-02-14 Thread Csaba Kassai
Hi, you can probably solve this with timers and state, but I am not sure that this the simplest way. Some guidance how timers and state work: https://beam.apache.org/blog/2017/02/13/stateful-processing.html https://beam.apache.org/blog/2017/08/28/timely-processing.html Regards, Csabi On Thu

Re: After Downloaded some File , How use next step TextIO.read() ?

2019-02-06 Thread Csaba Kassai
t; } > > c.output(genericRecord); // NO serialized > GenericRecord :'( > > } > > })) > > .apply("Write Avro formatted data", AvroIO > .writeGenericRecords(SC

Re: After Downloaded some File , How use next step TextIO.read() ?

2019-02-01 Thread Csaba Kassai
Hi, you can output the path where you saved the files on GCS in your first DoFn and use the TextIO.readAll() method. Also it is better to initialise the FTPClient in the @Setup method instead of every time you process and element in the @ProcessElement method. Something like this: PCollection inpu

Re: Joining streams

2018-10-19 Thread Csaba Kassai
Hi Alexander, I think what you are looking for in the Beam model is the PCollectionList and the Flatten

Re: Controlling namespace when writing to DataStore

2018-09-02 Thread Csaba Kassai
Hi, you can set the namespace in the key of the Datastore entity. Like this: Key.Builder keyBuilder = DatastoreHelper.makeKey(...); keyBuilder.getPartitionIdBuilder().setNamespace(namespace); https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/datastore/DatastoreV

Re: messageId from PubSubIO

2018-01-17 Thread Csaba Kassai
re. > > Here is a pointer to the contribution guide for more details: > https://beam.apache.org/contribute/contribution-guide/ > > On Wed, Jan 17, 2018 at 4:45 AM, Csaba Kassai > wrote: > >> Hi, >> >> is it possible the get somehow the messageId field of a Pu

messageId from PubSubIO

2018-01-17 Thread Csaba Kassai
Hi, is it possible the get somehow the messageId field of a Pub/Sub message in a DoFn after using the PubSubIO Beam source to read the messages? I need the default id which was assigned by the Pub/Sub service. I want to log it for debugging purposes. Using a custom attribute for the unique id an

Re: Is anyone using Beam for geo use cases?

2017-10-20 Thread Csaba Kassai
Hi Jacob, we are doing the opposite direction: we enrich data with geo coordinates from textual address using Google Maps API with Cloud Dataflow. Are you interested in this use-case? Csabi On Thu, 19 Oct 2017 at 21:00 Jacob Marble wrote: > Is anyone using Beam to solve geo problems? > > For e

Re: Missing getOptions on Pipeline class

2017-07-26 Thread Csaba Kassai
are always available. > > Let me know if this helps. > > On Wed, Jul 26, 2017 at 8:52 AM Csaba Kassai > wrote: > >> Hi, >> >> we are currently migrating our pipelines written with the 1.9.x >> (pre-beam) Dataflow Java SDK to the 2.0.0 version which is ba

Missing getOptions on Pipeline class

2017-07-26 Thread Csaba Kassai
Hi, we are currently migrating our pipelines written with the 1.9.x (pre-beam) Dataflow Java SDK to the 2.0.0 version which is based on the 2.0.0 Beam SDK. One change which cases a lot of headache is that getOptions method was removed from the Pipeline class. We used this method a lot during const

Re: Enriching stream messages based on external data

2017-06-01 Thread Csaba Kassai
Hi Gwilym, try to extract the DoFn into a separate static inner class or into a separate file as a top level class, instead of declaring as an anonymous inner class. In java the anonymous inner class has an implicit reference to the outer enclosing class, and I suspect that the serialiser is not a

Re: exclusive window per event

2017-04-02 Thread Csaba Kassai
Hi Antony, there is a small custom windowing example in this github repo which can be useful for you: https://github.com/Doctusoft/ds-dataflow-examples The code is not documented yet, so let me know if you have any question about it. Regards, Csabi On Fri, 31 Mar 2017 at 18:04 Robert Bradshaw