Re: Increase stream parallelism after reading from UnboundedSource

2016-12-05 Thread Aljoscha Krettek
Hi, I can only speak for Flink, there you usually fan-out/parallelise the stream after a non-parallel source. Cheers, Aljoscha On Mon, 5 Dec 2016 at 15:48 Amit Sela wrote: > Hi all, > > I have a general question about how stream-processing frameworks/engines > usually behave in the following sc

Re: [PROPOSAL] "IOChannelFactory" Redesign and Make it Configurable

2016-12-05 Thread Kenneth Knowles
I really like this document. It is easy to read and informative. Three things not addressed by the document: 1. Major Beam use cases. I'm sure we have a few in the SDK that could be outlined in terms of the new API with pseudocode. 2. Related work. How does this differ from other filesystem APIs a

Re: Jenkins build is unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #1730

2016-12-05 Thread Kenneth Knowles
The error message looks like a transient error, though it is easy to believe this change could cause a problem. I will keep a sharp eye on it. On Mon, Dec 5, 2016 at 4:21 PM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See Run

Re: [PROPOSAL] "IOChannelFactory" Redesign and Make it Configurable

2016-12-05 Thread Pei He
I have received a lot of comments in "Part 1: IOChannelFactory Redesign" [1]. And, I have updated the design based on the feedback. Now, I feel it is close to be ready for implementation, and I would like to summarize the changes: 1. Replaced FilePath with URI for resolving files paths. 2. Require

HiveIO

2016-12-05 Thread Vinoth Chandar
Hi guys, Saw a post around HiveIO on the users list with a PR followup. I am interested in this too and can pitch in on developement and testing.. Who & where is this work happening? Thanks VInoth

Re: [DISCUSS] ExecIO

2016-12-05 Thread Kenneth Knowles
Runners will be able to determine whether or not they can execute a pipeline based on these requirements. The details are rather the domain of the Fn API design. On Mon, Dec 5, 2016 at 1:38 PM, Eugene Kirpichov < kirpic...@google.com.invalid> wrote: > @Kenn - Would you suggest that all runners ne

Re: [DISCUSS] ExecIO

2016-12-05 Thread Ben Chambers
The problem with not integrating with Beam at all, is the runner doesn't know about any of these callouts. So it can't report "things aren't hung, there is a shell command running", etc. But, the integration doesn't need to be particularly deep. Imagine that the you can just pass the ProcessBuilde

Re: [DISCUSS] ExecIO

2016-12-05 Thread Eugene Kirpichov
@Kenn - Would you suggest that all runners need to support running code in a user-specified container? @Ben - Hmm, the features you're suggesting don't seem like they require deep integration into Beam itself, but can be accomplished by separate utility functions (or perhaps regular language-specif

Re: [DISCUSS] ExecIO

2016-12-05 Thread Ben Chambers
One option would be to use the reflective DoFn approach to this. Imagine something like: public class MyExternalFn extends DoFn { @ProcessElement // Existence of ShellExecutor indicates the code shells out. public void processElement(ProcessContext c, ShellExecutor shell) { ... Futur

Re: [DISCUSS] ExecIO

2016-12-05 Thread Kenneth Knowles
I would like the runner-independent, language-independent graph to have a way to specify requirements on the environment that a DoFn runs in. This would provide a natural way to talk about installed libraries, containers, external services that are accessed, etc, and I think the requirement of a pa

Re: Build failed in Jenkins: beam_PostCommit_Python_Verify #823

2016-12-05 Thread Ahmet Altay
I am looking at this. Caused by: https://github.com/apache/incubator-beam/pull/1342 tracking issue: https://issues.apache.org/jira/browse/BEAM-1088 Ahmet On Mon, Dec 5, 2016 at 11:42 AM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See

Re: [DISCUSS] ExecIO

2016-12-05 Thread Eugene Kirpichov
Hi JB, Thanks for bringing this to the mailing list. I also think that this is useful in general (and that use cases for Beam are more than just classic bigdata), and that there are interesting questions here at different levels about how to do it right. I suggest to start with the highest-level

Re: [DISCUSS] Graduation to a top-level project

2016-12-05 Thread Neelesh Salian
Quite an interesting discussion. Looking forward to the graduation. :) Thanks for putting this together. On Mon, Dec 5, 2016 at 10:30 AM, Davor Bonaci wrote: > A quick update: the vote within the Incubator has been started [1]. > > Davor > > [1] > https://lists.apache.org/thread.html/a8e9cecfe93

Re: [DISCUSS] Graduation to a top-level project

2016-12-05 Thread Davor Bonaci
A quick update: the vote within the Incubator has been started [1]. Davor [1] https://lists.apache.org/thread.html/a8e9cecfe93f0e464cc7c1774d2761ca14326df1101b7670ca8b1dc3@%3Cgeneral.incubator.apache.org%3E On Fri, Dec 2, 2016 at 11:40 AM, Davor Bonaci wrote: > A quick update on the progress:

Re: PAssertTest#runExpectingAssertionFailure() and waitUntilFinish()

2016-12-05 Thread Kenneth Knowles
Hi Stas, This is something special to TestPipeline and the test configuration for a runner. If runExpectingAssertionFailure() does not succeed, then our whole suite of RunnableOnService tests is not going to work, because they all have an assumption that TestPipeline#run() waits until the asserti

[INFO] Spark runner build is failing

2016-12-05 Thread Jean-Baptiste Onofré
Hi guys, The latest commit on the Spark runner broke the Spark runner: commit 158378f0f682b80462b917002b895ddbf782d06d Date: Sat Dec 3 00:47:39 2016 +0200 This commit introduced a failed test: Failed tests: ResumeFromCheckpointStreamingTest.testRun:131->runAgain:142->run:169 Success aggre

[DISCUSS] ExecIO

2016-12-05 Thread Jean-Baptiste Onofré
Hi beamers, Today, Beam is mainly focused on data processing. Since the beginning of the project, we are discussing about extending the use cases coverage via DSLs and extensions (like for machine learning), or via IO. Especially for the IO, we can see Beam use for data integration and data

PAssertTest#runExpectingAssertionFailure() and waitUntilFinish()

2016-12-05 Thread Stas Levin
Hi, PAssertTest#runExpectingAssertionFailure() contains the following block: try { pipeline.run(); } catch (AssertionError exc) { return exc; } I was wondering if in some cases this might not produce the desired effect, particularly if the run() method returns before the pipeline has ended.