Re: Caused by: java.lang.Exception: The user defined 'open()' method caused an exception: java.lang.NoClassDefFoundError: Could not initialize class com.google.common.io.BaseEncoding

2019-07-19 Thread Ryan Skraba
Hello! These are the "fun" problems to track down. I believe the GoogleCredentials class (0.12.0 in Beam, if that's where it's coming from) brings in an unvendored/unshaded dependency on guava-20.x. BaseEncoding was introduced in guava-14.x Someplace in your job, there's probably an older vers

How to deal with failed Checkpoint? What is current behavior for subsequent checkpoints?

2019-07-19 Thread Ken Barr
Reading the below two statements I conclude that CheckpointMark.finalizeCheckpoint() will be called in order, unless there is a failure. What happens in a failure? What happens to subsequent checkpoints in the case of a checkpoint failure? How do I prevent event re-ordering in the case of a check

Re: Beam release 2.5.0 tag SNAPSHOT version

2019-07-19 Thread Kenneth Knowles
Good catch. The release 2.5.0 was built with gradle, so that pom is left over. The gradle release plugin does not edit poms, so it did not change that. Instead, the pom is generated and you can find them on maven central like https://repo1.maven.org/maven2/org/apache/beam/beam-runners-direct-java/

Re: [python SDK] Returning Pub/Sub message_id and timestamp

2019-07-19 Thread Valentyn Tymofieiev
As of today, Beam Python streaming does not support writing to GCS yet, which explains https://stackoverflow.com/questions/54745869/how-to-create-a-dataflow-pipeline-from-pub-sub-to-gcs-in-python . You are right - id_label and timestamp_attribute does not work on Direct runner yet as per https:/

Re: [python SDK] Returning Pub/Sub message_id and timestamp

2019-07-19 Thread Pablo Estrada
Beam 2.14.0 will include support for writing files in the fileio module (the support will include GCS, local files, HDFS). It will also support streaming. The transform is still marked as experimental, and is likely to receive improvements - but you can check it out for your pipelines, and see if i

Re: [python SDK] Returning Pub/Sub message_id and timestamp

2019-07-19 Thread Valentyn Tymofieiev
Also, see https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/game/leader_board.py which involves both PubSub and Bigquery IOs. On Fri, Jul 19, 2019 at 12:31 PM Pablo Estrada wrote: > Beam 2.14.0 will include support for writing files in the fileio module > (the