Re: [New Proposal] Hive connector using native api

2017-05-23 Thread Jean-Baptiste Onofré
Hi, It looks good. I just saw some issues: - javadoc is not correct in HiveIO (it says write() for read ;)). - estimated size is global to the table (doesn't consider the filter). It's not a big deal, but it should be documented. - you don't use the desired bundle size provided by the runner fo

Re: [New Proposal] Hive connector using native api

2017-05-23 Thread Eugene Kirpichov
>From the point of view of general source/sink development, this code looks reasonable, except for a few violations of https://beam.apache.org/contribute/ptransform-style-guide/ (mainly around https://beam.apache.org/contribute/ptransform-style-guide/#runtime-errors-and-data-consistency) and other

RE: [New Proposal] Hive connector using native api

2017-05-23 Thread Seshadri Raghunathan
Hi, You can find a draft implementation of the same here : HiveIO Source - https://github.com/seshadri-cr/beam/commit/b74523c13e03dc70038bc1e348ce270fbb3fd99b HiveIO Sink - https://github.com/seshadri-cr/beam/commit/0008f772a989c8cd817a99987a145fbf2f7fc795 Please let us know your com

[New Proposal] Hive connector using native api

2017-05-23 Thread Madhusudan Borkar
Hi, HadoopIO can be used to read from Hive. It doesn't provide writing to Hive. This new proposal for Hive connector includes both source and sink. It uses Hive native api. Apache HCatalog provides way to read / write to hive without using mapreduce. HCatReader reads data from cluster, using basic

Go SDK in the works?

2017-05-23 Thread Robin Bartholdson
Curious about https://issues.apache.org/jira/browse/BEAM-2083 > Develop a Go SDK for Beam Is there active development being done on this? -Robin smime.p7s Description: S/MIME cryptographic signature

Re: New doc: Beam Runner Guide

2017-05-23 Thread Kenneth Knowles
On Tue, May 23, 2017 at 9:18 AM, Davor Bonaci wrote: > Great work! > > Once comments are resolved in the document, can we move this to the > Contribute section of the website? I think that would make it much more > discoverable to everyone. > Definitely. I want to make some of the "easy path" im

Re: [DISCUSS] HadoopInputFormat based IOs

2017-05-23 Thread Stephen Sisk
hey, Thanks for bringing this up! It's definitely an interesting question and I can see both sides of the argument. I can see the appeal of HIFIO wrapper IOs as stop-gaps and if they have good test coverage, it does ensure that the HIFIO route is working. If we have good IT coverage, it also mean

[DISCUSS] HadoopInputFormat based IOs

2017-05-23 Thread Ismaël Mejía
Hello, I bring this subject to the mailing list to see everybody’s opinion on the subject. The recent inclusion of HadoopInputFormatIO (HiFiIO) gave Beam users the option to ‘easily’ include data stores that support the Hadoop-based partitioning scheme. There are currently examples of how to use i

Re: New doc: Beam Runner Guide

2017-05-23 Thread Davor Bonaci
Great work! Once comments are resolved in the document, can we move this to the Contribute section of the website? I think that would make it much more discoverable to everyone. On Mon, May 22, 2017 at 1:33 AM, Aljoscha Krettek wrote: > Cool, I’ll give this a read once I’m back from conference

Re: Streaming Data Pipeline Support in Python

2017-05-23 Thread Davor Bonaci
As a project, we don't have an official time-based roadmap. So far, the project went through the "standard" cycle: getting accepted into incubation, code donation, first release, graduation, and the first stable release last week. Going forward, I'll be looking into discussing and formalizing some

Re: [Road map][R SDK]

2017-05-23 Thread Davor Bonaci
> > Is there an "official" roadmap for 2017? > We don't have an official time-based roadmap. So far, the project went through the "standard" cycle: getting accepted into incubation, code donation, first release, graduation, and the first stable release last week. Going forward, I'll be looking in

Re: [Road map][R SDK]

2017-05-23 Thread Sourabh Bajaj
Hi, 1. I don't think there is a R SDK in development within the main repository not sure if someone is building it in a fork or not. 2. There might be some demand for it in the data science community but currently people have been using rpy2 and doing the processing via the Python SDK based on th

[Road map][R SDK]

2017-05-23 Thread AndrasNagy
Hello, After a bit of googling i decided to ask the following: Is there an "official" roadmap for 2017? If yes, please send me a link. (i found one for 2016, but not so sure if that was retrospective) Is there demand for a R SDK? Is there already in construction something? If yes I would like t

Streaming Data Pipeline Support in Python

2017-05-23 Thread Hemanth Kondapalli
Hey When will the python sdk have support for streaming data in pipelines? Is it in the next release? What is the road map? I saw this comment in the github repo. """ Important: streaming pipeline support in Python Dataflow is in development and is not yet available for use. """ https://github.c

Re: Pipeline termination in the unified Beam model

2017-05-23 Thread Etienne Chauchot
@Thomas, @Aljoscha, @Kenneth, @JB, I was indeed talking about whether or not waitUntilFinish(duration) is blocking on an unbounded-but-finite collection. Thanks for your comments, they make a lot of sense. Etienne Le 10/05/2017 à 19:15, Kenneth Knowles a écrit : +1 to what Thomas said: At +

Jenkins build became unstable: beam_Release_NightlySnapshot #424

2017-05-23 Thread Apache Jenkins Server
See