Proposal and plan: new TextIO features based on SDF

2017-06-23 Thread Eugene Kirpichov
Hi all, I've written up a proposal for incrementally delivering a bunch of useful new features in TextIO based on Splittable DoFn. It's applicable to other file-based connectors, TextIO is just one good example. Let me know what you think! https://s.apache.org/textio-sdf Copy of abstract: Users

Re: [DISCUSS] support different versions of backends in an IO

2017-06-23 Thread Chamikara Jayalath
Probably this will be a common question from IO transform authors as Beam matures. Probably we should add a section on this to IO authoring guide [1][2] ? Thanks, Cham [1] https://beam.apache.org/documentation/io/authoring-overview/ [2] https://issues.apache.org/jira/browse/BEAM-1025 On Fri, Jun

Re: [DISCUSS] Bridge beam metrics to underlying runners to support metrics reporters?

2017-06-23 Thread Ben Chambers
I think the distinction between metrics being reported (the API we give users to create Metrics) from getting metrics out of (currently either using the PipelineResult to query results or connecting to an external metric store) is important. There is an additional distinction to be made in data pr

Re: Bundling multiple TestPipeline tests into one pipeline

2017-06-23 Thread Robert Bradshaw
+1 http://mail-archives.apache.org/mod_mbox/incubator-beam-dev/201610.mbox/%3CCAFFRZHX4yq%3D%3DxuvkPjwDFezVhWH82oj%2BgpS-OhUMc%3D3QUVaS1g%40mail.gmail.com%3E On Fri, Jun 23, 2017 at 9:23 AM, Davor Bonaci wrote: > This would be a great contribution if anyone wants to give it a try. > > On Thu, J

Re: Bundling multiple TestPipeline tests into one pipeline

2017-06-23 Thread Davor Bonaci
This would be a great contribution if anyone wants to give it a try. On Thu, Jun 22, 2017 at 9:23 PM, Jean-Baptiste Onofré wrote: > Hi Eugene > > I like the idea ! > > Regards > JB > > > On 06/23/2017 12:27 AM, Eugene Kirpichov wrote: > >> Hi folks and especially runner developers, >> >> https:/

Re: Build failed in Jenkins: beam_PostCommit_Java_MavenInstall #4199

2017-06-23 Thread Jean-Baptiste Onofré
OK, I see the issue with commons-text on SpannerIO. I'm preparing a PR to fix that. Regards JB On 06/23/2017 04:48 PM, Kenneth Knowles wrote: Hmm, sorry to have missed that this has been broken for a while. Top level error looks like a problem with a hive dependency. Is anyone able to check i

Re: Build failed in Jenkins: beam_PostCommit_Java_MavenInstall #4199

2017-06-23 Thread Manu Zhang
Looks like a network issue I've seen from time to time. On Fri, Jun 23, 2017 at 11:25 PM Jean-Baptiste Onofré wrote: > It built without problem on my machine. > > Let me check on Jenkins. > > Regards > JB > > On 06/23/2017 04:53 PM, Jean-Baptiste Onofré wrote: > > Hi Kenn, > > > > let me check,

Re: Build failed in Jenkins: beam_PostCommit_Java_MavenInstall #4199

2017-06-23 Thread Jean-Baptiste Onofré
It built without problem on my machine. Let me check on Jenkins. Regards JB On 06/23/2017 04:53 PM, Jean-Baptiste Onofré wrote: Hi Kenn, let me check, but I built early today without problem on my machine. Let me take a look. Regards JB On 06/23/2017 04:48 PM, Kenneth Knowles wrote: Hmm,

Re: Build failed in Jenkins: beam_PostCommit_Java_MavenInstall #4199

2017-06-23 Thread Jean-Baptiste Onofré
Hi Kenn, let me check, but I built early today without problem on my machine. Let me take a look. Regards JB On 06/23/2017 04:48 PM, Kenneth Knowles wrote: Hmm, sorry to have missed that this has been broken for a while. Top level error looks like a problem with a hive dependency. Is anyone

Re: Build failed in Jenkins: beam_PostCommit_Java_MavenInstall #4199

2017-06-23 Thread Kenneth Knowles
Hmm, sorry to have missed that this has been broken for a while. Top level error looks like a problem with a hive dependency. Is anyone able to check it out? On Fri, Jun 23, 2017 at 6:35 AM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See

Re: Rewind back tuple timestamp in DoFn

2017-06-23 Thread Shen Li
Hi Kenn, Thanks a lot for the info. I will follow the discussion. Shen On Fri, Jun 23, 2017 at 10:38 AM, Kenneth Knowles wrote: > Hi Shen, > > In order for this to work well with watermark tracking, we have some > initial ideas on https://issues.apache.org/jira/browse/BEAM-644 > > Kenn > > On

Re: Rewind back tuple timestamp in DoFn

2017-06-23 Thread Kenneth Knowles
Hi Shen, In order for this to work well with watermark tracking, we have some initial ideas on https://issues.apache.org/jira/browse/BEAM-644 Kenn On Wed, Jun 14, 2017 at 1:34 PM, Shen Li wrote: > Hi, > > I saw the DoFn#getAllowedTimestampSkew has been marked as deprecated. What > if a user do

Re: [DISCUSS] Apache Beam 2.1.0 release next week ?

2017-06-23 Thread Kenneth Knowles
For easy reference: https://s.apache.org/beam-2.1.0-burndown At the time I am writing, there are 23 open issues tagged for this release. On Fri, Jun 23, 2017 at 12:22 AM, Jean-Baptiste Onofré wrote: > Hi guys, > > thanks all for your feedback. > > The target is: > - create the release branch Mo

Re: [DISCUSS] support different versions of backends in an IO

2017-06-23 Thread Jean-Baptiste Onofré
Hi, It's something we already discussed in the past (for Kafka by instance). For Kafka, we were able to use a single IO with spring-el to detect the version. That's certainly the preferred approach, but it would not be possible in all cases. I would suggest, if first approach doesn't work: *

[DISCUSS] support different versions of backends in an IO

2017-06-23 Thread Etienne Chauchot
Hi guys, I'm working on Elasticsearch 5.x support for Beam IO (it only supports Elasticsearch 2.x right now). I wanted to have your opinion on some points related to maintenance. In this ES case a big part of the code of the IO is common between ES v2.x and ES v5.x. Still, there are some dif

Re: [DISCUSS] Bridge beam metrics to underlying runners to support metrics reporters?

2017-06-23 Thread Ismaël Mejía
That seems like a great idea (improving the current metrics design), I suppose there is a tradeoff between complexity and simplicity, and when I read the design document I think that some design decisions were done for the sake of simplicity, however as dropwizard is the 'de-facto' standard for met

Re: [DISCUSS] Bridge beam metrics to underlying runners to support metrics reporters?

2017-06-23 Thread Cody Innowhere
Hi Ismaël, Yes Distribution is similar to codahale's Histogram without the quantiles, and what I meant "adding support of Histogram" might be extending Distribution so that quantiles can be supported. I think in metrics area dropwizard metrics is more or less a standard and many frameworks have dir

Re: [DISCUSS] Bridge beam metrics to underlying runners to support metrics reporters?

2017-06-23 Thread Ismaël Mejía
Cody not sure if I follow, but isn't Distribution on Beam similar to codahale/dropwizard's HIstogram (without the quantiles) ? Meters are also in the plan but not implemented yet, see the Metrics design doc: https://s.apache.org/beam-metrics-api If I understand what you want is to have some sort

Jenkins build is still unstable: beam_Release_NightlySnapshot #456

2017-06-23 Thread Apache Jenkins Server
See

Re: [DISCUSS] Bridge beam metrics to underlying runners to support metrics reporters?

2017-06-23 Thread Jean-Baptiste Onofré
No problem at all, and you did well. I think it's really valuable to clarify what we all have in mind ;) To be honest, my focus is clearly on the "generic metric sink", but it makes sense to try to move forward in the mean time on the "collected metric" topic. Regards JB On 06/23/2017 09:29

Re: [DISCUSS] Bridge beam metrics to underlying runners to support metrics reporters?

2017-06-23 Thread Cody Innowhere
Yes I agree with you and sorry for messing them together in this discussion. I just wonder if someone plans to support Meters/Histograms in the near future. If so, we might need to modify metrics a bit in beam sdk IMHO, that's the reason I started this discussion. On Fri, Jun 23, 2017 at 3:21 PM,

Re: Bundling multiple TestPipeline tests into one pipeline

2017-06-23 Thread Jean-Baptiste Onofré
Hi Eugene I like the idea ! Regards JB On 06/23/2017 12:27 AM, Eugene Kirpichov wrote: Hi folks and especially runner developers, https://issues.apache.org/jira/browse/BEAM-2506 - quoting from there: Currently ValidatesRunner test suites run 1 pipeline per unit test. That's a lot of small pi

Re: [DISCUSS] Apache Beam 2.1.0 release next week ?

2017-06-23 Thread Jean-Baptiste Onofré
Hi guys, thanks all for your feedback. The target is: - create the release branch Monday and cherry pick the fixes - vote around end of next week I propose to set "Fix Version" to 2.1.0 in Jira we aim to include in the release. I will do a first Jira pass today. Regards JB On 06/22/2017 04:

Re: [DISCUSS] Bridge beam metrics to underlying runners to support metrics reporters?

2017-06-23 Thread Jean-Baptiste Onofré
Hi Codi, I think there are two "big" topics around metrics: - what we collect - where we send the collected data The "generic metric sink" (BEAM-2456) is for the later: we don't really change/touch the collected data (or maybe just in case of data format) we send to the sink. The Meters/His