Scio v0.7.0

2019-01-18 Thread Filipe Regadas
Hi all, We just released Scio 0.7.0. This release is based on the latest Beam 2.9.0, includes a lot of improvements, bug fixes, and some breaking changes. To have a smooth transition to this new version, make sure you have a look at the v0.7.0 Migration Guide

Re: KafkaIO not commiting offsets when using withMaxReadTime

2019-01-18 Thread Raghu Angadi
On Thu, Jan 10, 2019 at 7:57 AM Alexey Romanenko wrote: > Don’t you think that we could have some race condition there since, > according to initial issue description, sometimes offset was committed and > sometimes not? > Yeah, there is a timing issue. 'finalizeCheckpoint()' does not wait until

Re: KafkaIO not commiting offsets when using withMaxReadTime

2019-01-18 Thread Alexey Romanenko
Hi Jozef, I’m not aware if someone is working on this. In mean time, I created a Jira for this: https://issues.apache.org/jira/browse/BEAM-6466 Feel free to contribute if you wish. > On 17 Jan 2019, at 09:10, Jozef Vilcek wrote: > > Hello, > w

Re: Classloader memory leak on job restart (FlinkRunner)

2019-01-18 Thread Maximilian Michels
Hi Daniel, I did some more debugging. I think the fix we proposed only cures the symptoms. The cause is that your job uses Jackson which is also a dependency of Flink. So your job ends up using Flink's version of Jackson which then installs classes from your job in the Jackson cache. Now, thi

INFO:oauth2client.client:Attempting refresh to obtain initial access_token

2019-01-18 Thread OrielResearch Eila Arich-Landkof
Hello all, My pipeline looks like the following: (p | "read TXT " >> beam.io.ReadFromText('gs://path/to/file.txt',skip_header_lines= False) | "transpose file " >> beam.ParDo(transposeCSVFn(List1,List2))) The following error is being fired for the read: INFO:oauth2client.client:Attempting refr

Re: Spark

2019-01-18 Thread Alexey Romanenko
Hi Matt, I just wanted to remind that you also can use Apache Livy [1] to launch Spark jobs (or Beam pipelines that are built with support of SparkRunner) on Spark using just REST API [2]. And of course, you need to create manually a “fat" jar and put it somewhere where Spark can find it. [1]

Re: Spark

2019-01-18 Thread Juan Carlos Garcia
Hi Matt, With flink you will be able launch your pipeline just by invoking the main method of your main class, however it will run as standalone process and you will not have the advantage of distribute computation. Am Fr., 18. Jan. 2019, 09:37 hat Matt Casters geschrieben: > Thanks for the rep

Re: Classloader memory leak on job restart (FlinkRunner)

2019-01-18 Thread Daniel Harper
Thanks for raising this and the PR! In our production streaming job we’re using Kinesis, so good shout on the UnboundedSupportWrapper. On 17/01/2019, 21:08, "Maximilian Michels" wrote: >I'm glad that solved your GC problem. I think dipose() is a good place, >it is >meant for cleanup. > >In your

Re: Spark

2019-01-18 Thread Matt Casters
Thanks for the reply JC, I really appreciate it. I really can't force our users to use antiquated stuff like scripts, let alone command line things, but I'll simply use SparkLauncher and your comment about the main class doing Pipeline.run() on the Master is something I can work with... somewhat.