Re: [VOTE] Release 2.2.0, release candidate #4

2017-11-22 Thread Thomas Weise
+1 Run quickstart with Apex runner in embedded mode and on YARN. It needed couple tweaks to get there though. 1) Change quickstart pom.xml apex-runner profile: org.apache.hadoop hadoop-yarn-client ${hadoop.version} runtime

RE: Azure(ADLS) compatibility on Beam with Spark runner

2017-11-22 Thread Milan Chandna
I tried both the ways. Passed ADL specific configuration in --hdfsConfiguration as well and have setup the core-site.xml/hdfs-site.xml as well. As I mentioned it's a HDI + Spark cluster, those things are already setup. Spark job(without Beam) is also able to read and write to ADLS on same machine.

Re: [VOTE] Fixing @yyy.com.INVALID mailing addresses

2017-11-22 Thread Robert Bradshaw
+1 On Wed, Nov 22, 2017, 10:10 PM Jean-Baptiste Onofré wrote: > +1 > > Regards > JB > > On 11/23/2017 12:25 AM, Lukasz Cwik wrote: > > I have noticed that some e-mail addresses (notably @google.com) get > > .INVALID suffixed onto it so per...@yyy.com become > per...@yyy.com.INVALID > > in the Fr

Re: [VOTE] Fixing @yyy.com.INVALID mailing addresses

2017-11-22 Thread Jean-Baptiste Onofré
+1 Regards JB On 11/23/2017 12:25 AM, Lukasz Cwik wrote: I have noticed that some e-mail addresses (notably @google.com) get .INVALID suffixed onto it so per...@yyy.com become per...@yyy.com.INVALID in the From: header. I have figured out that this is an issue with the way that our mail server

Re: [VOTE] Fixing @yyy.com.INVALID mailing addresses

2017-11-22 Thread Pei HE
+1 On Thu, Nov 23, 2017 at 8:43 AM, Holden Karau wrote: > +1 (non-binding) > > On Wed, Nov 22, 2017 at 4:06 PM Kenneth Knowles > wrote: > > > +1 > > > > On Wed, Nov 22, 2017 at 3:43 PM, Lukasz Cwik > > wrote: > > > > > +1 > > > > > > On Wed, Nov 22, 2017 at 3:35 PM, Reuven Lax > > > wrote: >

Re: [VOTE] Fixing @yyy.com.INVALID mailing addresses

2017-11-22 Thread Holden Karau
+1 (non-binding) On Wed, Nov 22, 2017 at 4:06 PM Kenneth Knowles wrote: > +1 > > On Wed, Nov 22, 2017 at 3:43 PM, Lukasz Cwik > wrote: > > > +1 > > > > On Wed, Nov 22, 2017 at 3:35 PM, Reuven Lax > > wrote: > > > > > +1 > > > > > > On Nov 22, 2017 3:29 PM, "Ben Sidhom" > wrote: > > > > > > >

Re: [VOTE] Fixing @yyy.com.INVALID mailing addresses

2017-11-22 Thread Kenneth Knowles
+1 On Wed, Nov 22, 2017 at 3:43 PM, Lukasz Cwik wrote: > +1 > > On Wed, Nov 22, 2017 at 3:35 PM, Reuven Lax > wrote: > > > +1 > > > > On Nov 22, 2017 3:29 PM, "Ben Sidhom" wrote: > > > > > I'm not a PMC member, but this would be especially valuable if it > > > propagated DKIM signatures proper

Re: [VOTE] Fixing @yyy.com.INVALID mailing addresses

2017-11-22 Thread Lukasz Cwik
+1 On Wed, Nov 22, 2017 at 3:35 PM, Reuven Lax wrote: > +1 > > On Nov 22, 2017 3:29 PM, "Ben Sidhom" wrote: > > > I'm not a PMC member, but this would be especially valuable if it > > propagated DKIM signatures properly. > > > > On Wed, Nov 22, 2017 at 3:25 PM, Lukasz Cwik > > wrote: > > > > >

Re: Azure(ADLS) compatibility on Beam with Spark runner

2017-11-22 Thread Lukasz Cwik
In your example it seems as though your HDFS configuration doesn't contain any ADL specific configuration: "--hdfsConfiguration='[{\"fs.defaultFS\": \"hdfs://home/sample.txt\"]'" Do you have a core-site.xml or hdfs-site.xml configured as per: https://hadoop.apache.org/docs/current/hadoop-azure-dat

Re: [VOTE] Fixing @yyy.com.INVALID mailing addresses

2017-11-22 Thread Reuven Lax
+1 On Nov 22, 2017 3:29 PM, "Ben Sidhom" wrote: > I'm not a PMC member, but this would be especially valuable if it > propagated DKIM signatures properly. > > On Wed, Nov 22, 2017 at 3:25 PM, Lukasz Cwik > wrote: > > > I have noticed that some e-mail addresses (notably @google.com) get > > .INV

Re: [VOTE] Fixing @yyy.com.INVALID mailing addresses

2017-11-22 Thread Ben Sidhom
I'm not a PMC member, but this would be especially valuable if it propagated DKIM signatures properly. On Wed, Nov 22, 2017 at 3:25 PM, Lukasz Cwik wrote: > I have noticed that some e-mail addresses (notably @google.com) get > .INVALID suffixed onto it so per...@yyy.com become per...@yyy.com.INV

[VOTE] Fixing @yyy.com.INVALID mailing addresses

2017-11-22 Thread Lukasz Cwik
I have noticed that some e-mail addresses (notably @google.com) get .INVALID suffixed onto it so per...@yyy.com become per...@yyy.com.INVALID in the From: header. I have figured out that this is an issue with the way that our mail server is configured and opened https://issues.apache.org/jira/brow

Re: [VOTE] Release 2.2.0, release candidate #4

2017-11-22 Thread Nishu
Hi Eugene, I ran it on both standalone flink(non Yarn) and Flink on HDInsight Cluster(Yarn). Both ran successfully. :) Regards, Nishu Virus-free. www.avast.com

Re: [VOTE] Release 2.2.0, release candidate #4

2017-11-22 Thread Eugene Kirpichov
Thanks Nishu. So, if I understand correctly, your pipelines were running on non-YARN, but you're planning to run with YARN? I meanwhile was able to get Flink running on Dataproc (YARN), and validated quickstart and game examples. At this point we need validation for Spark and Flink non-YARN [I thi

Re: [VOTE] Release 2.2.0, release candidate #4

2017-11-22 Thread Nishu
Hi Eugene, No, I didn't try with those instead I have my custom pipeline where Kafka topic is the source. I have defined a Global Window and processing time trigger to read the data. Further it runs some transformation i.e. GroupByKey and CoGroupByKey. on the windowed collections. I was running th

Re: [VOTE] Release 2.2.0, release candidate #4

2017-11-22 Thread Kenneth Knowles
+1 (binding) On Wed, Nov 22, 2017 at 11:43 AM, Max Barrios wrote: > +1 (non-binding) > > Sent from my iPhone > > > On Nov 20, 2017, at 12:47 AM, Jean-Baptiste Onofré > wrote: > > > > Yeah, I have a Jira about that. > > > > You just have to update the existing symlink to point on the new release

Dataflow pipeline problem: Streaming data combined with large bounded data

2017-11-22 Thread Taylor Coleman
Hello, I have a Dataflow streaming pipeline where I need to consume queue messages (PubsubIO) and send each one through a very large lookup table from a database to join relevant values. I have tried the following methods: 1) Side Input In this approach I kept the large lookup table as a sid

Re: [VOTE] Release 2.2.0, release candidate #4

2017-11-22 Thread Max Barrios
+1 (non-binding) Sent from my iPhone > On Nov 20, 2017, at 12:47 AM, Jean-Baptiste Onofré wrote: > > Yeah, I have a Jira about that. > > You just have to update the existing symlink to point on the new release. > > I will update the release guide asap. > > Thanks ! > Regards > JB > >> On 11

Re: Version 2.2.0 release date

2017-11-22 Thread Ahmet Altay
Hi Stefania, Release candidate for 2.2.0 is currently being voted [1]. The release will happen after a successful vote. Ahmet [1] https://lists.apache.org/thread.html/da2acabdb15c9f8d11351f9167633a 4b089664fe3cce014ba619c937@%3Cdev.beam.apache.org%3E On Mon, Nov 20, 2017 at 7:04 AM, Stefania Ma

Re: [VOTE] Release 2.2.0, release candidate #4

2017-11-22 Thread Eugene Kirpichov
Thanks Nishu! Can you clarify which pipeline you were running? The validation spreadsheet includes 1) the quickstart and 2) mobile game walkthroughs. Was it one of these, or your custom pipeline? On Wed, Nov 22, 2017 at 10:20 AM Nishu wrote: > Hi, > > Typo in previous mail. I meant Flink runner

Re: [VOTE] Release 2.2.0, release candidate #4

2017-11-22 Thread Reuven Lax
Please update the spreadsheet. On Wed, Nov 22, 2017 at 10:19 AM, Nishu wrote: > Hi, > > Typo in previous mail. I meant Flink runner. > > Thanks, > Nishu > On Wed, 22 Nov 2017 at 19.17, > > > Hi, > > > > I build a pipeline using RC 2.2 today and ran with runner on yarn. > > It worked seamlessly

Re: [VOTE] Release 2.2.0, release candidate #4

2017-11-22 Thread Konstantinos Katsiapis
+1 (non-binding) Since Beam 2.2 is blocking release of tensorflow.transform 0.4. On Wed, Nov 22, 2017 at 10:19 AM, Nishu wrote: > Hi, > > Typo in previous mail. I meant Flink runner. > > Thanks, > Nishu > On Wed, 22 Nov 2017 at 19.17, > > > Hi, > > > >

Re: [VOTE] Release 2.2.0, release candidate #4

2017-11-22 Thread Nishu
Hi, Typo in previous mail. I meant Flink runner. Thanks, Nishu On Wed, 22 Nov 2017 at 19.17, > Hi, > > I build a pipeline using RC 2.2 today and ran with runner on yarn. > It worked seamlessly for unbounded sources. Couldn’t see any issues with > my pipeline so far :) > > > Thanks,Nishu > > On

Re: [VOTE] Release 2.2.0, release candidate #4

2017-11-22 Thread Nishu
Hi, I build a pipeline using RC 2.2 today and ran with runner on yarn. It worked seamlessly for unbounded sources. Couldn’t see any issues with my pipeline so far :) Thanks,Nishu On Wed, 22 Nov 2017 at 18.57, Reuven Lax wrote: > Who is validating Flink and Yarn? > > On Tue, Nov 21, 2017 at 9:

Re: [VOTE] Release 2.2.0, release candidate #4

2017-11-22 Thread Reuven Lax
Who is validating Flink and Yarn? On Tue, Nov 21, 2017 at 9:26 AM, Kenneth Knowles wrote: > On Mon, Nov 20, 2017 at 5:01 PM, Eugene Kirpichov < > kirpic...@google.com.invalid> wrote: > > > In the verification spreadsheet, I'm not sure I understand the difference > > between the "YARN" and "Stand

Re: Issues processing 150K files with DataflowRunner

2017-11-22 Thread Chamikara Jayalath
Thanks. Note that shards generated by ReadAll transform will not support dynamic work rebalancing but this should not matter when number of shards are large. Long term solution is Splittable DoFn which is on the works. - Cham On Wed, Nov 22, 2017 at 8:23 AM Asha Rostamianfar wrote: > Thanks a l

Re: Issues processing 150K files with DataflowRunner

2017-11-22 Thread Asha Rostamianfar
Thanks a lot, Cham! yes, it looks like we need a ReadAll transform similar to TextIO and AvroIO :) We'll implement this. On Tue, Nov 21, 2017 at 1:05 PM, Chamikara Jayalath wrote: > I suspect that you might be hitting Dataflow API limit for messages during > initial splitting the source. Some de

Re: Azure(ADLS) compatibility on Beam with Spark runner

2017-11-22 Thread Jean-Baptiste Onofré
Hi, FYI, I'm in touch with Microsoft Azure team about that. We are testing the ADLS support via HDFS. I keep you posted. Regards JB On 11/22/2017 09:12 AM, Milan Chandna wrote: Hi, Has anyone tried IO from(to) ADLS account on Beam with Spark runner? I was trying recently to do this but was

Version 2.2.0 release date

2017-11-22 Thread Stefania Mantisi
Hi everyone, I saw all the issues are currently marked as having been fixed. When will version 2.2 come out? Thank you! Stefania Mantisi -- *Stefania Mantisi * Software Engineer - Cloud Development - Noovle S.r.l. mail: stefania.mant...@noovle.it Noovle | The Ne

Azure(ADLS) compatibility on Beam with Spark runner

2017-11-22 Thread Milan Chandna
Hi, Has anyone tried IO from(to) ADLS account on Beam with Spark runner? I was trying recently to do this but was unable to make it work. Steps that I tried: 1. Took HDI + Spark 1.6 cluster with default storage as ADLS account. 2. Built Apache Beam on that. Built to include Beam-2790