Re: [DISCUSS] ExecIO

2016-12-06 Thread Jean-Baptiste Onofré
Hi Robert, The "wrapping" as IO is more for convenience for end users. The Read/Write can be replaced by documentation/javadoc. But you are right, the key part is the ExecFn. Regards JB On 12/07/2016 08:37 AM, Robert Bradshaw wrote: I don't mean to derail the tricky environment questions, b

Re: [DISCUSS] ExecIO

2016-12-06 Thread Robert Bradshaw
I don't mean to derail the tricky environment questions, but I'm not seeing why this is bundled as an IO rather than a plain DoFn (which can be applied to a PCollection of one or more commands, yielding their outputs). Especially for the case of a Read, which in this case is not splittable (initial

Re: Flink runner. Optimization for sideOutput with tags

2016-12-06 Thread Aljoscha Krettek
I'm having a look at your PRs now. I think the change is good, and it's actually quite simple too. Thanks for looking into this! On Mon, 5 Dec 2016 at 05:48 Alexey Demin wrote: > Aljoscha > > I mistaken with flink runtime =) > > What do you think about some modification FlinkStreamingTransformT

Re: Test failure on beam-sdks-java-maven-archetypes-examples-java8

2016-12-06 Thread Manu Zhang
Passed with VPN now. Sorry for the false alarm On Wed, Dec 7, 2016 at 9:37 AM Manu Zhang wrote: > Guys, > > Has anyone seen the following failure on the latest master ? > > [INFO] java.lang.IllegalStateException: Failed to validate > gs://apache-beam-samples/shakespeare/* > [INFO] at > it.pkg.

Test failure on beam-sdks-java-maven-archetypes-examples-java8

2016-12-06 Thread Manu Zhang
Guys, Has anyone seen the following failure on the latest master ? [INFO] java.lang.IllegalStateException: Failed to validate gs://apache-beam-samples/shakespeare/* [INFO] at it.pkg.MinimalWordCountJava8Test.testMinimalWordCountJava8(MinimalWordCountJava8Test.java:63) [INFO] Caused by: java.io.IO

Re: [DISCUSS] ExecIO

2016-12-06 Thread Eugene Kirpichov
Ben - the issues of "things aren't hung, there is a shell command running", aren't they general to all DoFn's? i.e. I don't see why the runner would need to know that a shell command is running, but not that, say, a heavy monolithic computation is running. What's the benefit to the runner in knowin

Re: [PROPOSAL] "IOChannelFactory" Redesign and Make it Configurable

2016-12-06 Thread Kenneth Knowles
Thanks for the thorough answers. It all sounds good to me. On Tue, Dec 6, 2016 at 12:57 PM, Pei He wrote: > Thanks Kenn for the feedback and questions. > > I responded inline. > > On Mon, Dec 5, 2016 at 7:49 PM, Kenneth Knowles > wrote: > > > I really like this document. It is easy to read and

Re: [PROPOSAL] "IOChannelFactory" Redesign and Make it Configurable

2016-12-06 Thread Pei He
Thanks Kenn for the feedback and questions. I responded inline. On Mon, Dec 5, 2016 at 7:49 PM, Kenneth Knowles wrote: > I really like this document. It is easy to read and informative. Three > things not addressed by the document: > > 1. Major Beam use cases. I'm sure we have a few in the SDK

Re: Increase stream parallelism after reading from UnboundedSource

2016-12-06 Thread Amit Sela
I think it is common in batch (and micro-batch for streaming) because at any given time you're computing a "chunk" (pick your naming.. we have lot's of them ;-) ) and slicing-up this chunk to distribute across more cpus if available is clearly better, but I was wondering about "event-at-a-time" pro

Re: Increase stream parallelism after reading from UnboundedSource

2016-12-06 Thread Raghu Angadi
On Sun, Dec 4, 2016 at 11:48 PM, Amit Sela wrote: > For any downstream computation, is it common for stream processors to > "fan-out/parallelise" the stream by shuffling the data into more > streams/partitions/bundles ? > I think so. It is pretty common in batch processing too.

Re: HiveIO

2016-12-06 Thread Ismaël Mejía
Hello, If you really need to read/write via Hive, remember that you can use the Hive Jdbc driver, and achieve this with Beam using the JdbcIO (this is probably less efficient for the streaming case but still a valid solution). Ismaël On Tue, Dec 6, 2016 at 12:04 PM, Vinoth Chandar wrote: > Gr

Re: HiveIO

2016-12-06 Thread Vinoth Chandar
Great. Thanks! Thanks, Vinoth > On Dec 6, 2016, at 2:06 AM, Jean-Baptiste Onofré wrote: > > Hi, > > Ismaël and I started HiveIO. > > I have several IOs ready to propose as PR, but, in order to limit the number > of open PRs, I would like to merge the pending ones. > > I will let you know wh

Re: [DISCUSS] ExecIO

2016-12-06 Thread Jean-Baptiste Onofré
Hi Eugene, thanks for the extended questions. I think we have two levels of expectations here: - end-user responsibility - worker/runner responsibility 1/ From a end-user perspective, the end-user has to know that using a system command (via ExecIO) and more generally speaking anything which

Re: HiveIO

2016-12-06 Thread Jean-Baptiste Onofré
Hi, Ismaël and I started HiveIO. I have several IOs ready to propose as PR, but, in order to limit the number of open PRs, I would like to merge the pending ones. I will let you know when the branches/PRs will be available. Regards JB On 12/05/2016 11:40 PM, Vinoth Chandar wrote: Hi guys,