Re: Recommended way to generate a TableRow from Json without using TableRowJsonCoder context OUTER?

2020-07-09 Thread Kaymak, Tobias
Thank you Lars and thank you Luke! On Wed, Jul 8, 2020 at 9:33 PM Luke Cwik wrote: > The deprecated method is not going to be removed anytime soon so I > wouldn't worry about it being removed. > > If you really want to use non-deprecated methods, then the > TableRowJsonCoder uses the StringUtf8C

Re: Beam supports Flink Async IO operator

2020-07-09 Thread Kaymak, Tobias
Hi Eleanore, Maybe batched RPC is what you are looking for? https://beam.apache.org/blog/timely-processing/ On Wed, Jul 8, 2020 at 6:20 PM Eleanore Jin wrote: > Thanks Luke and Max for the information. > > We have the use case that inside a DoFn, we will need to call external > services to trig

Re: TableRow class is not the same after serialization

2020-07-09 Thread Jeff Klukas
I would expect the following line to fail: List h = ((List) bigQueryRow.get("hits")); The top-level bigQueryRow will be a TableRow, but `bigQueryRow.get("hits")` is only guaranteed to be an instance of some class that implements `Map`. So that line needs to become: List h = ((Lis

Re: TableRow class is not the same after serialization

2020-07-09 Thread Kirill Zhdanovich
Thanks for explaining. Is it documented somewhere that TableRow contains Map? I don't construct it, I fetch from Google Analytics export to BigQuery table. On Thu, 9 Jul 2020 at 16:40, Jeff Klukas wrote: > I would expect the following line to fail: > > List h = ((List) bigQueryRow.get("h

Re: TableRow class is not the same after serialization

2020-07-09 Thread Jeff Klukas
It looks like the fact that your pipeline in production produces nested TableRows is an artifact of the following decision within BigQueryIO logic: https://github.com/apache/beam/blob/1a260cd70117b77e33442f0be8f213363a2db5fc/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/g

Re: TableRow class is not the same after serialization

2020-07-09 Thread Kirill Zhdanovich
So I guess I need to switch to Map instead of TableRow? On Thu, 9 Jul 2020 at 17:13, Jeff Klukas wrote: > It looks like the fact that your pipeline in production produces nested > TableRows is an artifact of the following decision within BigQueryIO logic: > > > https://github.com/apache/beam/blo

Re: TableRow class is not the same after serialization

2020-07-09 Thread Jeff Klukas
On Thu, Jul 9, 2020 at 10:18 AM Kirill Zhdanovich wrote: > So I guess I need to switch to Map instead of TableRow? > Yes, I would definitely recommend that you switch to Map. That's the most basic interface, and every deserialization of a top-level TableRow object must provide objects matching t

Re: TableRow class is not the same after serialization

2020-07-09 Thread Kirill Zhdanovich
Cool! Thanks a lot for your explanation and your time, Jeff, very much appreciated. On Thu, 9 Jul 2020 at 17:27, Jeff Klukas wrote: > On Thu, Jul 9, 2020 at 10:18 AM Kirill Zhdanovich > wrote: > >> So I guess I need to switch to Map instead of TableRow? >> > > Yes, I would definitely recommend

Re: ReadFromKafka: UnsupportedOperationException: The ActiveBundle does not have a registered bundle checkpoint handler

2020-07-09 Thread Maximilian Michels
This used to be working but it appears @FinalizeBundle (which KafkaIO requires) was simply ignored for portable (Python) pipelines. It looks relatively easy to fix. -Max On 07.07.20 03:37, Luke Cwik wrote: The KafkaIO implementation relies on checkpointing to be able to update the last commit

Re: ReadFromKafka: UnsupportedOperationException: The ActiveBundle does not have a registered bundle checkpoint handler

2020-07-09 Thread Piotr Filipiuk
Thank you for looking into this. I upvoted the BEAM-6868 . Is there anything else I can do to have that feature prioritized, other than trying to contribute myself? Regarding DirectRunner, as I mentioned above I can see in the worker_handlers.py lo

RE: KinesisIO checkpointing

2020-07-09 Thread Sunny, Mani Kolbe
We did the same and started using maxReadTime and put the application to run on a recurring schedule of 5 minutes. It works fine end to end without any error. But the problem is that it always starts reading from the beginning of the Kinesis stream when it stop-starts. When I did some investiga

Re: Unable to read value from state/Unable to fetch data due to token mismatch for key

2020-07-09 Thread Mohil Khare
Thanks Reuven for your reply. Good to know that it is benign. Regards Mohil On Wed, Jul 8, 2020 at 10:19 PM Reuven Lax wrote: > This error should be benign. It often means that ownership of the work > item was moved to a different worker (possibly caused by autoscaling or > other source of work

Re: KinesisIO checkpointing

2020-07-09 Thread Luke Cwik
The BoundedReadFromUnboundedReader does checkpoint the underlying UnboundedSource, is that checkpoint logic not working? Do you have KinesisIO configured to always read from a specific point? On Thu, Jul 9, 2020 at 9:56 AM Sunny, Mani Kolbe wrote: > We did the same and started using maxReadTime

Re: KinesisIO checkpointing

2020-07-09 Thread Mani Kolbe
Is it required to set JobName and checkpointDir options for checkpointing to work? On Thu, 9 Jul, 2020, 9:25 PM Luke Cwik, wrote: > The BoundedReadFromUnboundedReader does checkpoint the underlying > UnboundedSource, is that checkpoint logic not working? > Do you have KinesisIO configured to a