Re: Webinar: Unlocking the Power of Apache Beam with Apache Flink

2020-05-28 Thread Maximilian Michels
Thanks to everyone who joined and asked questions. Really enjoyed this new format! -Max On 28.05.20 08:09, Marta Paes Moreira wrote: > Thanks for sharing, Aizhamal - it was a great webinar! > > Marta > > On Wed, 27 May 2020 at 23:17, Aizhamal Nurmamat kyzy > mailto:aizha...@apache.org>> wrote:

Issue while submitting python beam pipeline on flink - local

2020-05-28 Thread Ashish Raghav
Hello Guys , I am trying to run a python beam pipeline on flink. I am trying to run apache_beam.examples.wordcount_minimal but with Pipelineoptions as "--runner=PortableRunner", "--job_endpoint=192.168.99.100:8099", "--environment_type=LOOPBACK", I have a apachebeam/flink1.9_job_server running o

[ANNOUNCE] Beam 2.21.0 Released

2020-05-28 Thread Kyle Weaver
The Apache Beam team is pleased to announce the release of version 2.21.0. Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch and stream (continuous) processing. See https://beam.apache.org You can download the release her

Re: Issue while submitting python beam pipeline on flink - local

2020-05-28 Thread Kyle Weaver
Hi Ashish, can you check to make sure apachebeam/flink1.9_job_server is also on version 2.21.0? On Thu, May 28, 2020 at 7:13 AM Ashish Raghav wrote: > Hello Guys , > > > > I am trying to run a python beam pipeline on flink. I am trying to run > apache_beam.examples.wordcount_minimal but with Pip

RE: Issue while submitting python beam pipeline on flink - local

2020-05-28 Thread Ashish Raghav
Hi Kyle, The Latest Version available on docker hub is 2.20.0 From: Kyle Weaver Sent: 28 May 2020 16:51 To: user@beam.apache.org Subject: Re: Issue while submitting python beam pipeline on flink - local EXTERNAL EMAIL Do not click links or open attachments unless you recognise the sender and k

Re: Issue while submitting python beam pipeline on flink - local

2020-05-28 Thread Kyle Weaver
2.21.0 should be available now: https://hub.docker.com/layers/apache/beam_flink1.9_job_server/2.21.0/images/sha256-eeac6dd4571794a8f985e9967fa0c1522aa56a28b5b0a0a34490a600065f096d?context=explore On Thu, May 28, 2020 at 7:27 AM Ashish Raghav wrote: > Hi Kyle, > > > > The Latest Version available

RE: Issue while submitting python beam pipeline on flink - local

2020-05-28 Thread Ashish Raghav
Thanks will pull and rebuild container. Will get back after test. From: Kyle Weaver Sent: 28 May 2020 17:02 To: user@beam.apache.org Subject: Re: Issue while submitting python beam pipeline on flink - local EXTERNAL EMAIL Do not click links or open attachments unless you recognise the sender and

Re: Issue while submitting python beam pipeline on flink - local

2020-05-28 Thread Kyle Weaver
Oh, I see the issue. The documentation you were looking at might contain an outdated reference to the apachebeam repo on Docker hub. We have migrated all Docker images to the apache top-level repository. So instead of apachebeam/flink1.9_job_server, you should use apache/beam_flink1.9_job_server.

Re: Flink Runner with HDFS

2020-05-28 Thread Maximilian Michels
The configuration looks good but the HDFS file system implementation is not intended to be used directly. Instead of: > lines = p | 'ReadMyFile' >> beam.Create(hdfs_client.open(input_file_hdfs)) Use: > lines = p | 'ReadMyFile' >> beam.io.ReadFromText(input_file_hdfs) Best, Max On 28.05.20 0

RE: Issue while submitting python beam pipeline on flink - local

2020-05-28 Thread Ashish Raghav
Ok. Will test with latest version. From: Kyle Weaver Sent: 28 May 2020 17:03 To: user@beam.apache.org Subject: Re: Issue while submitting python beam pipeline on flink - local EXTERNAL EMAIL Do not click links or open attachments unless you recognise the sender and know the content is safe. Rep

Issue while submitting python beam pipeline on flink cluster - local

2020-05-28 Thread Ashish Raghav
Hi Guys, I have another issue when I submit the python beam pipeline ( wordcount example provided by apache beam team) directly on flink cluster running local. PipelineOptions are : "--runner=FlinkRunner", "--flink_version=1.9", "--environment_type=LOOPBACK", "--flink_master=192.168.99.100:8081"

Re: Issue while submitting python beam pipeline on flink cluster - local

2020-05-28 Thread Maximilian Michels
Potentially a Windows issue. Do you have a Unix environment for testing? On 28.05.20 13:35, Ashish Raghav wrote: > Hi Guys, > >   > > I have another issue when I submit the python beam pipeline ( wordcount > example provided by apache beam team) directly on flink cluster running > local. > >  

RE: Issue while submitting python beam pipeline on flink - local

2020-05-28 Thread Ashish Raghav
Now it fails with this error: WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter. INFO:root:Using Python SDK docker image: apache/beam_python3.7_sdk:2.21.0. If the image is not available at local, we will try to pull from hub.docker.com INFO:apache_beam

RE: Issue while submitting python beam pipeline on flink cluster - local

2020-05-28 Thread Ashish Raghav
I can set that up. Can I test on linux? Ashish Raghav | DE Core Compete | ashish.rag...@corecompete.com Accelerating Cloud Analytics -Original Message- From: Maximilian Michels Sent: 28 May 2020 17:34 To: user@beam.apache.org Subject: Re: Issue while submitting python beam pipeline on

Re: Issue while submitting python beam pipeline on flink - local

2020-05-28 Thread Maximilian Michels
You are using the LOOPBACK environment which requires that the Flink cluster can connect back to your local machine. Since the loopback environment by defaults binds to localhost that should not be possible. I'd suggest using the default Docker environment. On 28.05.20 14:06, Ashish Raghav wrote:

RE: Issue while submitting python beam pipeline on flink - local

2020-05-28 Thread Ashish Raghav
Ok, will try that. -Original Message- From: Maximilian Michels Sent: 28 May 2020 18:20 To: user@beam.apache.org Subject: Re: Issue while submitting python beam pipeline on flink - local EXTERNAL EMAIL Do not click links or open attachments unless you recognise the sender and know th

RE: Issue while submitting python beam pipeline on flink - local

2020-05-28 Thread Ashish Raghav
I am running container using docker desktop. Now I ran with default setup and point to localhost:8099 , but it times out. I am also running a flink cluster and I can connect to the UI through localhost:8081. e

Re: Flink Runner with HDFS

2020-05-28 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
Hi Max, I incorporated your suggestion and I still get the same error message. -Buvana On 5/28/20, 7:35 AM, "Maximilian Michels" wrote: The configuration looks good but the HDFS file system implementation is not intended to be used directly. Instead of: > lines = p | 'ReadMy

Re: PaneInfo showing UNKOWN State

2020-05-28 Thread Jay
I was trying to simulate the PaneInfo in Python to check for parity with the Java SDK. I was able to get PaneInfo after introduction a CombinePerKey. I am not sure why GBK operation is returning the correct information. On Wed, 27 May 2020 at 00:54, Robert Bradshaw wrote: > To clarify, PaneInfo

Re: PaneInfo showing UNKOWN State

2020-05-28 Thread Jay
Also, note that I observed similar behaviour with DataflowRunner. On Thu, 28 May 2020 at 21:24, Jay wrote: > I was trying to simulate the PaneInfo in Python to check for parity with > the Java SDK. I was able to get PaneInfo after introduction a CombinePerKey. > I am not sure why GBK operation i

Re: Issue while submitting python beam pipeline on flink - local

2020-05-28 Thread Kyle Weaver
> You are using the LOOPBACK environment which requires that the Flink > cluster can connect back to your local machine. Since the loopback > environment by defaults binds to localhost that should not be possible. On the Flink runner page, we recommend using --net=host to avoid the kinds of networ

Cross-Language pipeline fails with PROCESS SDK Harness

2020-05-28 Thread Alexey Romanenko
Hello, I’m trying to run a Cross-Language pipeline (Beam 2.21, Java pipeline with an external Python transform) with a PROCESS SDK Harness and Spark Portable Runner but it fails. To do that I have a running Spark Runner Job Server (Spark local) and standalone Expansion Service (Python) which co

Re: Cross-Language pipeline fails with PROCESS SDK Harness

2020-05-28 Thread Kyle Weaver
What source are you using? On Thu, May 28, 2020 at 1:24 PM Alexey Romanenko wrote: > Hello, > > I’m trying to run a Cross-Language pipeline (Beam 2.21, Java pipeline with > an external Python transform) with a PROCESS SDK Harness and Spark Portable > Runner but it fails. > To do that I have a ru

Pipeline Processing Time

2020-05-28 Thread Talat Uyarer
Hi, I have a pipeline which has 5 steps. What is the best way to measure processing time for my pipeline? Thnaks

Re: Pipeline Processing Time

2020-05-28 Thread Kyle Weaver
Which runner are you using? On Thu, May 28, 2020 at 1:43 PM Talat Uyarer wrote: > Hi, > > I have a pipeline which has 5 steps. What is the best way to measure > processing time for my pipeline? > > Thnaks >

Re: Pipeline Processing Time

2020-05-28 Thread Talat Uyarer
I am using Dataflow Runner. The pipeline read from kafkaIO and send Http. I could not find any metadata field on the element to set first read time. On Thu, May 28, 2020 at 10:44 AM Kyle Weaver wrote: > Which runner are you using? > > On Thu, May 28, 2020 at 1:43 PM Talat Uyarer > wrote: > >> H

Re: Cross-Language pipeline fails with PROCESS SDK Harness

2020-05-28 Thread Chamikara Jayalath
This might have to do with https://github.com/apache/beam/pull/11670. +Lukasz Cwik was there a subsequent fix that was not included in the release ? On Thu, May 28, 2020 at 10:29 AM Kyle Weaver wrote: > What source are you using? > > On Thu, May 28, 2020 at 1:24 PM Alexey Romanenko > wrote: >

writing new IO with Maven dependencies

2020-05-28 Thread Ken Barr
I am currently developing an IO that I would like to eventually submit to Apache Beam project. The IO itself is Apache2.0 licensed. Does every chained dependency I use need to be opensource? If yes, how is this usually proven? Is it enough that only Maven dependencies are used?

Re: writing new IO with Maven dependencies

2020-05-28 Thread Luke Cwik
+dev On Thu, May 28, 2020 at 11:55 AM Ken Barr wrote: > I am currently developing an IO that I would like to eventually submit to > Apache Beam project. The IO itself is Apache2.0 licensed. > Does every chained dependency I use need to be opensource? > The transitive dependency tree must have

Re: Cross-Language pipeline fails with PROCESS SDK Harness

2020-05-28 Thread Luke Cwik
I haven't had any changes since #11670 in this space so nothing is missing from the release. Also, the GreedyPipelineFuser has not been updated to support XLang as it has some baked-in assumptions around flatten[1] and likely other issues. It has worked because the simple examples we have run have

Re: Pipeline Processing Time

2020-05-28 Thread Luke Cwik
What do you mean by processing time? Are you trying to track how long it takes for a single element to be ingested into the pipeline until it is output somewhere? Do you have a bounded pipeline and want to know how long all the processing takes? Do you care about how much CPU time is being consume

Re: Pipeline Processing Time

2020-05-28 Thread Talat Uyarer
Yes I am trying to track how long it takes for a single element to be ingested into the pipeline until it is output somewhere. My pipeline is unbounded. I am using KafkaIO. I did not think about CPU time. if there is a way to track it too, it would be useful to improve my metrics. On Thu, May 28,

Re: Cross-Language pipeline fails with PROCESS SDK Harness

2020-05-28 Thread Alexey Romanenko
For testing purposes, it’s just “Create.of(“Name1”, “Name2”, ...)" > On 28 May 2020, at 19:29, Kyle Weaver wrote: > > What source are you using? > > On Thu, May 28, 2020 at 1:24 PM Alexey Romanenko > wrote: > Hello, > > I’m trying to run a Cross-Language pipel

Re: Cross-Language pipeline fails with PROCESS SDK Harness

2020-05-28 Thread Kyle Weaver
Can you try removing the cross-language component(s) from the pipeline and see if it still has the same error? On Thu, May 28, 2020 at 4:15 PM Alexey Romanenko wrote: > For testing purposes, it’s just “Create.of(“Name1”, “Name2”, ...)" > > On 28 May 2020, at 19:29, Kyle Weaver wrote: > > What s

Re: Pipeline Processing Time

2020-05-28 Thread Luke Cwik
Dataflow provides msec counters for each transform that executes. You should be able to get them from stackdriver and see them from the Dataflow UI. You need to keep track of the timestamp of the element as it flows through the system as part of data that goes alongside the element. You can use th

HDFS I/O with Beam on Spark and Flink runners - consistent Error messages

2020-05-28 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
Kyle, Max, All, I am desperately trying to get Beam working on at least one of the runners of Flink or Spark. Facing failures in both cases with similar message. Flink runner issue (Beam v 2.19.0) was reported yesterday with a permalink: https://lists.apache.org/thread.html/r4977083014eb2d25271

Re: HDFS I/O with Beam on Spark and Flink runners - consistent Error messages

2020-05-28 Thread Kyle Weaver
Hi Buvana, I suspect this is a bug. If you can try running your pipeline again with these changes: 1. Remove `--spark-master-url spark://:7077` from your Docker run command. 2. Add `--environment_type=LOOPBACK` to your pipeline options. It will help us confirm the cause of the issue. On

Re: HDFS I/O with Beam on Spark and Flink runners - consistent Error messages

2020-05-28 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
Hello Kyle, That works. Produces the expected output. -Buvana From: Kyle Weaver Sent: Thursday, May 28, 2020 9:19 PM To: user@beam.apache.org Subject: Re: HDFS I/O with Beam on Spark and Flink runners - consistent Error messages Hi Buvana, I suspect this is

Re: HDFS I/O with Beam on Spark and Flink runners - consistent Error messages

2020-05-28 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
Hi Kyle, As reported earlier, LOOPBACK with Portable Runner/Job Server works fine. Further to that, I tried PortableRunner with additional options as follows: "--runner=PortableRunner", "--job_endpoint=embed", "--environment_config=apache/beam_python3.6_sdk" And I get an error messa