Re: [PROPOSAL] External Join with KV Stores

2017-07-06 Thread Kenneth Knowles
In the streams/tables way of talking, side inputs are tables. External KV stores are basically also [globally windowed] tables. Both are time-varying. I think it makes perfect sense to access an external KV store in userland directly rather than listen to its changelog and reproduce the same table

Re: Docs/guidelines on writing filesystem sources and sinks

2017-07-06 Thread Dmitry Demeshchuk
Hi Stephen, Thanks for the detailed reply! Some comments inline. On Thu, Jul 6, 2017 at 5:21 PM, Stephen Sisk wrote: > Hi Dmitry, > > I'm excited to hear that you'd like to do this work. If you haven't > already, I'd first suggest that you open a JIRA issue to make sure other > folks know you'

Re: Docs/guidelines on writing filesystem sources and sinks

2017-07-06 Thread Stephen Sisk
Hi Dmitry, I'm excited to hear that you'd like to do this work. If you haven't already, I'd first suggest that you open a JIRA issue to make sure other folks know you're working on this. I was involved in working on the recent java HDFS file system implementation, so I'll try and share what I kno

Re: Docs/guidelines on writing filesystem sources and sinks

2017-07-06 Thread Chamikara Jayalath
Currently we don't have official documentation or a testing guide for adding new FileSystems. Best source here would be existing FileSystem implementations, as you mentioned. I don't think parameters for initiating FileSystems should be passed when creating a read transform. Can you try to get any

Re: [Proposal] Submitting pipelines to Runners in another language

2017-07-06 Thread Ahmet Altay
Thank you Sourabh. I added my comments as well and +1 to Kenn. On Thu, Jul 6, 2017 at 2:21 PM, Kenneth Knowles wrote: > I added a few detailed comments. I definitely think we should move forward > on this to get Python pipelines running on all our our runners, and > hopefully that gets us ready

Re: Docs/guidelines on writing filesystem sources and sinks

2017-07-06 Thread Dmitry Demeshchuk
I also stumbled upon a problem that I can't really pass additional configuration to a filesystem, e.g. lines = pipeline | 'read' >> ReadFromText('s3://my-bucket/kinglear.txt', aws_config=AWSConfig()) because the ReadFromText class relies on PTransform's constructor, which has a pre-defined set of

Re: [Proposal] Submitting pipelines to Runners in another language

2017-07-06 Thread Kenneth Knowles
I added a few detailed comments. I definitely think we should move forward on this to get Python pipelines running on all our our runners, and hopefully that gets us ready for any future SDKs too. On Wed, Jul 5, 2017 at 2:21 PM, Sourabh Bajaj < sourabhba...@google.com.invalid> wrote: > Hi, > > I

Re: Failure in Apex runner

2017-07-06 Thread Reuven Lax
Thomas, any suggestions on what we should do? Do you have an idea what's going on, or should we exclude this test for now until you have time to look at it? Reuven On Wed, Jul 5, 2017 at 3:36 PM, Reuven Lax wrote: > I wonder if the watermark is accidentally advancing too early, causing > Apex t

Re: writing to s3 in beam

2017-07-06 Thread Jyotirmoy Sundi
Hi, The fix is in below, I am not sure if its a proper fix, if someone reviews it or require any changes, happy to do a PR. @Override public int read(ByteBuffer dst) throws IOException { if (closed) { throw new IOException("Channel is closed"); } try { return inputStream.read(dst)

Re: writing to s3 in beam

2017-07-06 Thread Lukasz Cwik
Jyotirmoy, would you like to open up a PR with your changes that you did to get S3 reading working? On Thu, Jul 6, 2017 at 1:26 PM, Jyotirmoy Sundi wrote: > Hi Ted, > BEAM-2500 is for reading it seems, I made a couple of changes in > beam HadoopFileSystem > and able to read s3 data but write

Re: writing to s3 in beam

2017-07-06 Thread Jyotirmoy Sundi
Hi Ted, BEAM-2500 is for reading it seems, I made a couple of changes in beam HadoopFileSystem and able to read s3 data but write i am still facing the issues above. any help would highly appreciate. On Wed, Jul 5, 2017 at 8:49 PM, Ted Yu wrote: > Please take a look at BEAM-2500 (and related

Re: Can't run wordcount example from Intellij

2017-07-06 Thread Kenneth Knowles
This does not seem right. We have probably missed it because we always run it in test scope. It was introduced somewhat recently by 69b01a6118702277348d2f625af669225c9ed99e [1] from #3161 [2] but neither the PR nor commit message has much information. That option should probably not be in TestPip

Can't run wordcount example from Intellij

2017-07-06 Thread Manu Zhang
Hi all, Running wordcount example from Intellij fails with "ClassNotFoundException: org.hamcrest.Matcher". This is because DirectRunner#defaultTransformOverrides will validate TestPipelineOptions, which will in turn load "hamcrest". Is this expected ? Thanks, Manu