Re: [VOTE] Release 0.4.0-incubating, release candidate #3

2016-12-20 Thread Aljoscha Krettek
+1 - verified signatures - ran Quickstart using the staging repository on Flink cluster - verified build form source (We'll probably do a 0.5.0 release shortly after this so we can fix the BigQuery issues.) On Mon, 19 Dec 2016 at 21:22 Dan Halperin wrote: > I vetted the binary artifacts accomp

Re: [DISCUSS] [BEAM-438] Rename one of PTransform.apply or PInput.apply

2016-12-07 Thread Aljoscha Krettek
+1 I've seen this mistake myself in some PRs. On Thu, 8 Dec 2016 at 06:10 Ben Chambers wrote: > +1 -- This seems like the best option. It's a mechanical change, and the > compiler will let users know it needs to be made. It will make the mistake > much less common, and when it occurs it will be

Re: Flink runner. Optimization for sideOutput with tags

2016-12-06 Thread Aljoscha Krettek
gt; > In some cases we need more performance and serialization on every > > transformation very expensive, > > but try merge all business logic in one DoFn it to make processing > > unsupportable. > > > > >> I think your stack trace is not complete, at le

Re: Increase stream parallelism after reading from UnboundedSource

2016-12-05 Thread Aljoscha Krettek
Hi, I can only speak for Flink, there you usually fan-out/parallelise the stream after a non-parallel source. Cheers, Aljoscha On Mon, 5 Dec 2016 at 15:48 Amit Sela wrote: > Hi all, > > I have a general question about how stream-processing frameworks/engines > usually behave in the following sc

Re: How to create a Pipeline with Cycles

2016-11-30 Thread Aljoscha Krettek
er/flink-benchmarks/src/main/java/flink/benchmark/AdvertisingTopologyNative.java > > My excuses again, > Ismael > > On Wed, Nov 30, 2016 at 11:30 AM, Aljoscha Krettek > wrote: > > > Hi, > > there is support for cycles in Flink but the Yahoo benchmark is not > making > > u

Re: How to create a Pipeline with Cycles

2016-11-30 Thread Aljoscha Krettek
Hi, there is support for cycles in Flink but the Yahoo benchmark is not making use of that feature, if I'm not completely mistaken. Cheers, Aljoscha On Wed, 30 Nov 2016 at 09:57 Ismaël Mejía wrote: > Hello, > > Shen you should probably first check the benchmark implementation at > github, I am

Re: Meet up at Strata+Hadoop World in Singapore

2016-11-29 Thread Aljoscha Krettek
Hi, I'll also be there to give a talk (and also at the Beam tutorial). Cheers, Aljoscha On Wed, Nov 30, 2016, 00:51 Dan Halperin wrote: > Hey folks, > > Who will be attending Strata+Hadoop World next week in Singapore? Tyler and > I will be there, giving a Beam tutorial [0] and some talks [2,3]

Re: Flink runner. Optimization for sideOutput with tags

2016-11-29 Thread Aljoscha Krettek
Hi Alexey, I think it should be possible to optimise this particular transformation by using a split/select pattern in Flink. (See split and select here: https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/datastream_api.html#datastream-transformations). The current implementation is no

Fwd: Jenkins build became unstable: beam_PostCommit_RunnableOnService_FlinkLocal #938

2016-11-28 Thread Aljoscha Krettek
This is caused by the new Stateful DoFn tests not being properly ignored in the Flink streaming runner. (I myself actually reviewed the change that added the ignore for the batch runner.) I have a fix for this here: https://github.com/apache/incubator-beam/pull/1446 -- Forwarded message -

Re: [DISCUSS] Graduation to a top-level project

2016-11-22 Thread Aljoscha Krettek
+1 I'm quite enthusiastic about the growth of the community and the open discussions! On Tue, 22 Nov 2016 at 19:51 Jason Kuster wrote: > An enthusiastic +1! > > In particular it's been really great to see the commitment and interest of > the community in different kinds of testing. Between what

Re: Flink runner. Wrapper for DoFn

2016-11-21 Thread Aljoscha Krettek
because now it's bottleneck for me? Thanks, Alexey Diomin 2016-11-19 11:59 GMT+04:00 Aljoscha Krettek : @amir, what do you mean? Naming a ParDo "startBundle" is not the same thing as having a @StartBundle or startBundle() (for OldDoFn) method in your ParDo. On Sat, 19 Nov 2016

Re: Hosting data stores for IO Transform testing

2016-11-21 Thread Aljoscha Krettek
Hi Stephen, I really like your proposal! I don't have any comments because this seems very well "researched" already. I'm hoping others will also have a look at this as well because "real" integration testing provides a new level of confidence in the code, IMHO. Cheers, Aljoscha On Wed, 16 Nov

Re: Flink runner. Wrapper for DoFn

2016-11-19 Thread Aljoscha Krettek
t; .apply("startBundle", ParDo.of( new DoFn, KV String>>() { > Amir- > > From: Aljoscha Krettek > To: amir bahmanyari ; Eugene Kirpichov < > kirpic...@google.com>; "dev@beam.incubator.apache.org" < > dev@beam.incubator.apache.org> >

Re: Flink runner. Wrapper for DoFn

2016-11-18 Thread Aljoscha Krettek
Regarding the Flink runner and how it calls startBundle()/finishBundle(): it's currently done like this because it is correct and because there is no other "natural" point where it could be called. Flink continuously processes elements and at some (user defined) interval performs checkpoints to per

Re: Configuring Jenkins

2016-11-15 Thread Aljoscha Krettek
+1 I like this a lot! On Tue, 15 Nov 2016 at 10:37 Jean-Baptiste Onofré wrote: > Fantastic Davor ! > > I like this approach, I gonna take a deeper look. > > Thanks ! > > Regards > JB > > On 11/15/2016 10:01 AM, Davor Bonaci wrote: > > Hi everybody, > > As I'm sure everybody knows, we use Apache'

Fwd: Jenkins build became unstable: beam_PostCommit_RunnableOnService_FlinkLocal #813

2016-11-11 Thread Aljoscha Krettek
This looks like it was introduced by the new commit mentioned here but it's actually caused by a pre-existing (unknown) bug in the Flink runner: https://issues.apache.org/jira/browse/BEAM-965. I also have a fix ready. Btw, how should we deal with these messages from Jenkins? I'm writing to the de

Re: Introduction + contributing to docs

2016-11-11 Thread Aljoscha Krettek
Welcome to the community! :-) On Fri, Nov 11, 2016, 22:36 Kenneth Knowles wrote: > Welcome! It is great to witness the website really coming together. > > On Fri, Nov 11, 2016 at 12:35 PM, Amit Sela wrote: > > > Welcome Melissa! > > > > On Fri, Nov 11, 2016, 22:31 Jean-Baptiste Onofré > wrote:

Re: [DISCUSS] Change "RunnableOnService" To A More Intuitive Name

2016-11-09 Thread Aljoscha Krettek
+1 What I would really like to see is automatic derivation of the capability matrix from an extended Runner Test Suite. (As outlined in Thomas' doc). On Wed, 9 Nov 2016 at 21:42 Kenneth Knowles wrote: > Huge +1 to this. > > The two categories I care most about are: > > 1. Tests that need a runn

[ANNOUNCE] Beam 0.3.0-incubating Released

2016-10-31 Thread Aljoscha Krettek
Congratulations, team! I just finalised everything for the most recent release. The artefacts are on Maven, the website is updated and the source release should slowly propagate through the Apache servers. I'll also send an email to the user list to highlight some of the new features. Cheers, Alj

Re: [VOTE] Release 0.3.0-incubating, release candidate #1

2016-10-28 Thread Aljoscha Krettek
(binding) > > >> >> > > > >> >> >So far I've successfully checked: > > >> >> >* signatures and digests > > >> >> >* source releases file layouts > > >> >> >* matched git tags and commit ids > > >

[VOTE] Apache Beam release 0.3.0-incubating

2016-10-28 Thread Aljoscha Krettek
Hi everyone, Please review and vote on the release candidate #1 for the Apache Beam version 0.3.0-incubating, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIR

[RESULT] [VOTE] Release 0.3.0-incubating, release candidate #1

2016-10-28 Thread Aljoscha Krettek
IPMC for final voting. Thanks everyone! On Fri, 28 Oct 2016 at 09:09 Aljoscha Krettek wrote: > The voting time has elapsed. I'm hereby closing this vote and will tally > the results in a separate thread. > > On Thu, 27 Oct 2016 at 17:38 Neelesh Salian wrote: > > +1 (non

Re: Start of release 0.3.0-incubating

2016-10-26 Thread Aljoscha Krettek
new feature. > >>> > >>> Can you make a strong argument for why MQTT in particular should be > >>> release > >>> blocking? > >>> > >>> Dan > >>> > >>> On Thu, Oct 20, 2016 at 9:26 AM, Jean-Baptiste Onofré &

Re: [VOTE] Release 0.3.0-incubating, release candidate #1

2016-10-25 Thread Aljoscha Krettek
gt; [4] https://github.com/apache/spark/pull/15167/files > > Dan > > On Tue, Oct 25, 2016 at 11:01 AM, Seetharam Venkatesh < > venkat...@innerzeal.com> wrote: > > > +1 > > > > Thanks! > > > > On Mon, Oct 24, 2016 at 2:30 PM Aljoscha Kret

Re: [DISCUSS] Current ongoing work on runners

2016-10-25 Thread Aljoscha Krettek
I think we might need to update the capability matrix with some of the new features that have popped up. Immediate things that come to mind are: * Timer/State API for user DoFns (coupled with new-style DoFn) (not yet completely in master) * SplittableDoFn This would allow tracking the process in

Re: [VOTE] Release 0.3.0-incubating, release candidate #1

2016-10-25 Thread Aljoscha Krettek
How did you generated the checksums? Because both SHA1/MD5 can't be > > >automatically checked because "no properly formatted SHA1/MD5 checksum > > >lines found". > > > > > >Great to see the project moving forward at this speed :-) > > > &g

Re: The Availability of PipelineOptions

2016-10-25 Thread Aljoscha Krettek
+1 This sounds quite straightforward. On Tue, 25 Oct 2016 at 01:36 Thomas Groh wrote: > Hey everyone, > > I've been working on a declaration of intent for how we want to use > PipelineOptions and an API change to be consistent with that intent. This > is generally part of the move to the Runner

[VOTE] Release 0.3.0-incubating, release candidate #1

2016-10-24 Thread Aljoscha Krettek
Hi Team! Please review and vote at your leisure on release candidate #1 for version 0.3.0-incubating, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA releas

Re: Maven Release Plugin Does Not Update Version of Archetypes

2016-10-24 Thread Aljoscha Krettek
t; 0.3.0-release branch? > > > > On Mon, Oct 24, 2016 at 11:09 AM, Dan Halperin > > wrote: > > > >> Correct issue link: https://issues.apache.org/jira/browse/BEAM-806 > >> > >> No answers, but looking around. > >> > >>

Maven Release Plugin Does Not Update Version of Archetypes

2016-10-24 Thread Aljoscha Krettek
Hi, are there any Maven mavens who happen to know how https://issues.apache.org/jira/browse/BEAM-108 can be fixed? By the way, the release plugin does also not update the version of the archetypes when setting the next SNAPSHOT version. IMHO, it's a bit of a release blocker so I'm hoping we can ge

Re: [DISCUSS] Deferring (pre) combine for merging windows.

2016-10-24 Thread Aljoscha Krettek
@Amit: Yes, Flink is more "what you write is what you get". For example, in Flink we have a Fold function for windows which cannot be efficiently computed with merging windows (it would require using a "group by" window and then folding the iterable). We just don't allow this. For Beam, I think it

Re: Tracking backward-incompatible changes for Beam

2016-10-22 Thread Aljoscha Krettek
Very good idea! Should we already start thinking about automatic tests for backwards compatibility of the API? On Fri, 21 Oct 2016 at 10:56 Jean-Baptiste Onofré wrote: > Hi Dan, > > +1, good idea. > > Regards > JB > > On 10/21/2016 02:21 AM, Dan Halperin wrote: > > Hey everyone, > > > > In the

Re: [ANNOUNCEMENT] New committers!

2016-10-22 Thread Aljoscha Krettek
Welcome everyone! +3 :-) On Sat, 22 Oct 2016 at 06:43 Jean-Baptiste Onofré wrote: > Just a small thing. > > If it's not already done, don't forget to sign a ICLA and let us know > your apache ID. > > Thanks, > Regards > JB > > On 10/22/2016 12:18 AM, Davor Bonaci wrote: > > Hi everyone, > > Plea

Re: Start of release 0.3.0-incubating

2016-10-20 Thread Aljoscha Krettek
josha !! > >> > >> Do you mind to wait the week end or Monday to start the release ? I > would > >> like to include MqttIO if possible. > >> > >> Thanks ! > >> Regards > >> JB > >> > >> ⁣​ > >> > >&

Re: Release Guide

2016-10-20 Thread Aljoscha Krettek
Hi, thanks for taking the time and writing this extensive doc! If no-one is against this I would like to be the release manager for the next (0.3.0-incubating) release. I would work with the guide and update it with anything that I learn along the way. Should I open a new thread for this or is it

Re: [DISCUSS] Sources and Runners

2016-10-19 Thread Aljoscha Krettek
+Jason, looping him in directly because he might have an opinion on what I'm going to say. Should we maybe add integration tests that verify that all runners can correctly read from and write to an external system in a complete Pipeline. At least for Kafka, which seems to be the most used option i

Re: [KUDOS] Contributed runner: Apache Apex!

2016-10-17 Thread Aljoscha Krettek
Congrats! :-) On Mon, 17 Oct 2016 at 18:55 Kenneth Knowles wrote: > *I would like to :-) > > On Mon, Oct 17, 2016 at 9:51 AM Kenneth Knowles wrote: > > > Hi all, > > > > I would to, once again, call attention to a great addition to Beam: a > > runner for Apache Apex. > > > > After lots of revie

Re: Simplifying User-Defined Metrics in Beam

2016-10-13 Thread Aljoscha Krettek
I finally found the time to have a look. :-) The API looks very good! (It's very similar to an API we recently added to Flink, which is inspired by the same Codahale/Dropwizard metrics). About the semantics, the "A", "B" and "C" you mention in the doc: doesn't this mean that we have to keep the m

Re: [PROPOSAL] Splittable DoFn - Replacing the Source API with non-monolithic element processing in DoFn

2016-10-13 Thread Aljoscha Krettek
tly support a particular subset of features, we can start > > transitioning some connectors or writing new ones using that subset (I > > expect that streaming connectors will come first). > > > > Additionally, the Python SDK is considering using Splittable DoFn as the > > *on

Re: Simplifying User-Defined Metrics in Beam

2016-10-06 Thread Aljoscha Krettek
Hi, I'm currently in holidays but I'll put some thought into this and give my comments once I get back. Aljoscha On Wed, Oct 5, 2016, 21:51 Ben Chambers wrote: > To provide some more background I threw together a quick doc outlining my > current thinking for this Metrics API. You can find it at

Re: We've hit 1000 PRs!

2016-09-27 Thread Aljoscha Krettek
Sweet! :-) On Mon, 26 Sep 2016 at 23:47 Dan Halperin wrote: > Hey folks! > > Just wanted to send out a note -- we've hit 1000 PRs in GitHub as of > Saturday! That's a tremendous amount of work for the 7 months since PR#1. > > I bet we hit 2000 in much fewer than 7 months ;) > > Dan >

About Finishing Triggers

2016-09-14 Thread Aljoscha Krettek
Hi, I had a chat with Kenn at Flink Forward and he did an off-hand remark about how it might be better if triggers where not allowed to mark a window as finished and instead always be "Repeatedly" (if I understood correctly). Maybe you (Kenn) could go a bit more in depth about what you meant by th

Re: Anyone @scale tomorrow?

2016-09-09 Thread Aljoscha Krettek
Cool, thanks for letting us know! On Fri, Sep 9, 2016, 18:45 Dan Halperin wrote: > Hey folks, > > Wanted to let you know that the Beam talk went pretty well. People were > *very* excited about Beam -- loved the idea of not having to rewrite their > pipelines every time they want to try out a new

Re: KafkaIO Windowing Fn

2016-09-02 Thread Aljoscha Krettek
ering the window firing here. ( does not > look like to be 30 sec trigger) > > > > Regards > Sumit Chawla > > > On Wed, Aug 31, 2016 at 11:14 PM, Aljoscha Krettek > wrote: > > > Ah I see, the Flink Runner had quite some updates in 0.2.0-incubating and >

Re: KafkaIO Windowing Fn

2016-08-31 Thread Aljoscha Krettek
nners.TransformTreeNode.visit( > > TransformTreeNode.java:220)* > > * at > > org.apache.beam.sdk.runners.TransformTreeNode.visit( > > TransformTreeNode.java:220)* > > * a* > > > > Regards > > Sumit Chawla > > > > > > On Wed, Aug 31, 2016

Re: KafkaIO Windowing Fn

2016-08-31 Thread Aljoscha Krettek
e Pipeline. > > On Tue, Aug 30, 2016 at 11:24 PM, Chawla,Sumit > wrote: > > > Yes. I added it only for DirectRunner as it cannot translate > > Read(UnboundedSourceOfKafka) > > > > Regards > > Sumit Chawla > > > > > > On Tue, Aug 30, 2016

Re: KafkaIO Windowing Fn

2016-08-30 Thread Aljoscha Krettek
business > specific transformations only. > > Regards > Sumit Chawla > > > On Tue, Aug 30, 2016 at 4:49 AM, Aljoscha Krettek > wrote: > > > Hi, > > could you maybe also post the complete that you're using with the > > FlinkRunner? I could have a look i

Re: KafkaIO Windowing Fn

2016-08-30 Thread Aljoscha Krettek
Hi, could you maybe also post the complete that you're using with the FlinkRunner? I could have a look into it. Cheers, Aljoscha On Tue, 30 Aug 2016 at 09:01 Chawla,Sumit wrote: > Hi Thomas > > Sorry i tried with DirectRunner but ran into some kafka issues. Following > is the snippet i am work

Re: [PROPOSAL] Splittable DoFn - Replacing the Source API with non-monolithic element processing in DoFn

2016-08-30 Thread Aljoscha Krettek
t is implemented > in my current prototype https://github.com/apache/incubator-beam/pull/896 > (see > SplittableParDo.ProcessFn) > > On Mon, Aug 29, 2016 at 2:55 AM Aljoscha Krettek > wrote: > > > Hi, > > I have another question about this: currently, unbounded sources

Re: [PROPOSAL] Splittable DoFn - Replacing the Source API with non-monolithic element processing in DoFn

2016-08-29 Thread Aljoscha Krettek
#x27;' into the state of K2. > > > Etc. > > > If partition 1 goes away, the processElement call will return "do not > > > resume", so a timer will not be set and instead the state associated > with > > > K1 will be GC'd. > > > >

Re: Remove legacy import-order?

2016-08-23 Thread Aljoscha Krettek
+1 on the import order +1 on also starting a discussion about enforced formatting On Wed, 24 Aug 2016 at 06:43 Jean-Baptiste Onofré wrote: > Agreed. > > It makes sense for the import order. > > Regards > JB > > On 08/24/2016 02:32 AM, Ben Chambers wrote: > > I think introducing formatting shoul

Re: [PROPOSAL] Splittable DoFn - Replacing the Source API with non-monolithic element processing in DoFn

2016-08-21 Thread Aljoscha Krettek
will not be set and instead the state associated with > > K1 will be GC'd. > > > > So basically it's almost like cooperative thread scheduling: things run > for > > a while, until the runner tells them to checkpoint, then they set a timer > > to resume the

Re: [PROPOSAL] Splittable DoFn - Replacing the Source API with non-monolithic element processing in DoFn

2016-08-20 Thread Aljoscha Krettek
Hi, I have another question that I think wasn't addressed in the meeting. At least it wasn't mentioned in the notes. In the context of replacing sources by a combination of to SDFs, how do you determine how many "SDF executor" instances you need downstream? For the sake of argument assume that bot

Re: Beam Testing Guide on Website

2016-08-13 Thread Aljoscha Krettek
Thanks for the write up, this guide should be really useful for new contributors! I really like that we are so strong (and getting stronger) on automated tests that also test how a user would use the system. This makes releases significantly easier because we don't have to verify all the functional

Re: [Proposal] Pipelines and their executions naming changes.

2016-08-10 Thread Aljoscha Krettek
Hi, Flink itself allows the user to specify a String when creating a Job, this will be visible in the web dashboard and maybe some other places. This would roughly correspond to the proposed PipelineOptions.pipelineName. An executing job does not have a human-readable name, just an ID that has to b

Re: [PROPOSAL] Splittable DoFn - Replacing the Source API with non-monolithic element processing in DoFn

2016-08-08 Thread Aljoscha Krettek
ransform. Does this address your question? > > Thanks! > > On Fri, Aug 5, 2016 at 7:14 AM Aljoscha Krettek > wrote: > > > I really like the proposal, especially how it unifies at lot of things. > > > > One question: How would this work with sources that (right

Re: [PROPOSAL] Website page or Jira to host all current proposal discussion and docs

2016-08-08 Thread Aljoscha Krettek
Please have a look at this: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals We recently started using this process in Flink and so far are quite happy with it. On Mon, 8 Aug 2016 at 06:52 Jean-Baptiste Onofré wrote: > Good point Ben. > > I would say a "discussion"

Re: Proposal: Dynamic PIpelineOptions

2016-08-05 Thread Aljoscha Krettek
+1 It's true that Flink provides a way to pass dynamic parameters to operator instances. That's not used in any of the built-in sources and operators, however. They are instantiated with their parameters when the graph is constructed. So what you are suggesting for Beam would actually provide more

Re: [PROPOSAL] Splittable DoFn - Replacing the Source API with non-monolithic element processing in DoFn

2016-08-05 Thread Aljoscha Krettek
I really like the proposal, especially how it unifies at lot of things. One question: How would this work with sources that (right now) return true from UnboundedSource.requiresDeduping(). As I understand it the code that executes such sources has to do bookkeeping to ensure that we don't get dupl

Re: [PROPOSAL] Pipeline Runner API design doc

2016-08-02 Thread Aljoscha Krettek
Hi, thanks for putting this together. Now that I'm seeing them side by side I think the Avro schema looks a lot nicer than the JSON schema but it's probably alright since we don't want to change this often (as you already said). The advantage of JSON is that the (intermediate) plans can easily be i

Re: [REFLECT] Beam’s Half Birthday!

2016-08-01 Thread Aljoscha Krettek
+1 This sounds very good, I can't come up with anything that you missed. On Mon, 1 Aug 2016 at 08:00 Jean-Baptiste Onofré wrote: > Happy half birthday ;) > > Very good idea Frances !! > > And the numbers are impressive indeed. > > Maybe, we can add kind of teasing about new incoming PRs like: C

Re: [RESULT] Release version 0.2.0-incubating

2016-07-31 Thread Aljoscha Krettek
in > wrote: > > > I'm happy to announce that we have unanimously approved this release. > > > > There are 3 binding approving votes: > > * Dan Halperin > > * Jean-Baptiste Onofré > > * Amit Sela > > > > There is a fourth approving vote: &

Re: [VOTE] Release version 0.2.0-incubating

2016-07-31 Thread Aljoscha Krettek
ot; - source release contains no binaries - sources files have license headers (I'm relying on the rat plugin here, though) (As I said above I fixed the "fixed version" tag on two issues.) On Sun, 31 Jul 2016 at 08:22 Aljoscha Krettek wrote: > Just a quick note, these two we

Re: [VOTE] Release version 0.2.0-incubating

2016-07-31 Thread Aljoscha Krettek
Just a quick note, these two were not fixed for 0.2.0: - [BEAM-478 ] - Create Vector, Matrix types and operations to enable linear algebra API - [BEAM-322 ] - Compare encoded keys in streaming mode I

Re: [PROPOSAL] State and Timers for DoFn (aka per-key workflows)

2016-07-29 Thread Aljoscha Krettek
+1 Very nice proposal and the API already looks very good. I guess the only thing people still like to discuss on this is naming of things. :-) I just have one general remark about giving users access to state and timers. The Beam model takes great care to mostly shield users from the reality of o

Re: [DISCUSS] cluster infrastructure - resource manager - for on going tests

2016-07-28 Thread Aljoscha Krettek
Jul 28, 2016 06:35, "Amit Sela" wrote: > > > So what would be the preferred resource manager to test Flink on ? > > > > On Thu, Jul 28, 2016, 16:34 Aljoscha Krettek > wrote: > > > > > Flink also has a standalone mode. > > > > > > On

Re: [DISCUSS] cluster infrastructure - resource manager - for on going tests

2016-07-28 Thread Aljoscha Krettek
Flink also has a standalone mode. On Thu, 28 Jul 2016 at 13:42 Ismaël Mejía wrote: > Good subject, YARN is the de-facto standard at least from the point of > view of the Big Data Distributions (Cloudera, Hortonworks, etc) and Cloud > offers, e.g. AWS EMR, Azure HDInsight and Google Dataproc), a

Re: [PROPOSAL] A brand new DoFn

2016-07-27 Thread Aljoscha Krettek
+1 At first I liked the API but was skeptical because I though that this would require reflective invocation. Then I read on and saw that code generation is used and was convinced. :-) I especially like how it both cleans up the API and allows more optimizations in the future, especially with sid

Re: Podling Report Reminder - August 2016

2016-07-27 Thread Aljoscha Krettek
+1 On Wed, Jul 27, 2016, 21:14 Amit Sela wrote: > +1 > > On Wed, Jul 27, 2016, 22:12 Dan Halperin > wrote: > > > +1 on all the above. > > > > On Wed, Jul 27, 2016 at 12:07 PM, Jean-Baptiste Onofré > > wrote: > > > > > Hi James, > > > > > > Sure, please go ahead. > > > > > > I propose to send t

Re: Beam/Flink : State access

2016-07-26 Thread Aljoscha Krettek
Hi, the purpose of Beam is to abstract the user from the underlying execution engine. IMHO, allowing access to state of the underlying execution engine will never be a goal for the Beam project. If you want/need to access Flink state, I think this is a good indicator that you should use Flink dire

Re: Help understand how Flink Runner translate triggering information

2016-07-25 Thread Aljoscha Krettek
Hi, for that you would have to look at how Combine.PerKey and GroupByKey are translated. We use a GroupAlsoByWindowViaWindowSetDoFn that internally uses a ReduceFnRunner to manage all the windowing. The windowing strategy as well as the SystemReduceFn is passed to GroupAlsoByWindowViaWindowSetDoFn.

Re: Flink runner and KafkaIO

2016-07-24 Thread Aljoscha Krettek
Hi, the root cause seems to be that the Flink streaming runner does not support side-inputs (Views). Inside the CountWords it uses a Combine somewhere which uses side-inputs to sneak in a default value in case there is nothing to combine. Good news is that I have almost finished work on making sid

Re: [KUDOS] Contributed runner: Gearpump!

2016-07-21 Thread Aljoscha Krettek
Nice! Could you also go a bit into details on the supported features? On Thu, 21 Jul 2016 at 10:40 Maximilian Michels wrote: > Big news. Congrats! > > On Thu, Jul 21, 2016 at 9:11 AM, Ismaël Mejía wrote: > > Congratulations Manu! > > Great work! Looking forward to test it with our current pipe

Re: [PROPOSAL] CoGBK as primitive transform instead of GBK

2016-07-20 Thread Aljoscha Krettek
+1 Out of curiosity, does Cloud Dataflow have a CoGBK primitive or will it also be executed as a GBK there? On Thu, 21 Jul 2016 at 02:29 Kam Kasravi wrote: > +1 - awesome Manu. > > On Wednesday, July 20, 2016 1:53 PM, Kenneth Knowles > wrote: > > > +1 > > I assume that the intent is for t

Re: Adding DoFn Setup and Teardown methods

2016-07-18 Thread Aljoscha Krettek
Did you mean "usual" or "useful"? ;-) On Mon, 18 Jul 2016 at 12:42 Maximilian Michels wrote: > +1 for setup() and teardown() methods. Very usual for proper initialization > and cleanup of DoFn related data structures. > > On Wed, Jun 29, 2016 at 9:34 PM, Aljos

Re: Display Data Runner Support

2016-07-03 Thread Aljoscha Krettek
Thanks Scott for this compilation of information! I'll look into how this can be incorporated into the Flink runner once I have some time on my hands. On Thu, 30 Jun 2016 at 17:05 Scott Wegner wrote: > Hi Beam Dev community, > > I wanted to circle-back on a recent Beam feature, Display Data, whi

Re: Adding DoFn Setup and Teardown methods

2016-06-29 Thread Aljoscha Krettek
+1 I think some people might already mistake the startBundle()/finishBundle() methods for what the new methods are supposed to be On Tue, 28 Jun 2016 at 19:38 Raghu Angadi wrote: > This is terrific! > Thanks for the proposal. > > On Tue, Jun 28, 2016 at 9:06 AM, Thomas Groh > wrote: > > > Hey E

Re: [DISCUSS] Beam data plane serialization tech

2016-06-29 Thread Aljoscha Krettek
My bad, I didn't know that. Thanks for the clarification! On Wed, 29 Jun 2016 at 16:38 Daniel Kulp wrote: > > > On Jun 27, 2016, at 10:24 AM, Aljoscha Krettek > wrote: > > > > Out of the systems you suggested Thrift and ProtoBuf3 + gRPC are probably > > best

Re: Improvements to issue/version tracking

2016-06-27 Thread Aljoscha Krettek
+1 The release view and especially the automatic generation of release notes should come in quite handy. On Tue, 28 Jun 2016 at 01:01 Davor Bonaci wrote: > Hi everyone, > I'd like to propose a simple change in Beam JIRA that will hopefully > improve our issue and version tracking -- to actually

Re: [DISCUSS] Beam data plane serialization tech

2016-06-27 Thread Aljoscha Krettek
> > > > > > > > > > > > > > > > On Fri, Jun 17, 2016 at 3:20 AM, Amit Sela > wrote: > > > > > > > +1 on Aljoscha comment, not sure where's the benefit in having a > > > > "schematic" serialization. &g

Re: Sliding-Windowed PCollectionView as SideInput

2016-06-27 Thread Aljoscha Krettek
Hi, the WindowFn is responsible for mapping from main-input window to side-input window. Have a look at WindowFn.getSideInputWindow(). For SlidingWindows this takes the last possible sliding window as the side-input window. Cheers, Aljoscha On Sun, 26 Jun 2016 at 22:30 Shen Li wrote: > Hi, > >

Re: Scala DSL

2016-06-26 Thread Aljoscha Krettek
I'm also in favor of branding it a DSL rather than an SDK. Mostly because it uses the Java SDK and because it does not (necessarily) follow/implement the Beam model. As the Java SDK does and what the Python SDK is apparently going for. On Sat, 25 Jun 2016 at 10:04 Amit Sela wrote: > Just looked

Re: [DISCUSS] PTransform.named vs. named apply

2016-06-22 Thread Aljoscha Krettek
±1 for the named apply On Thu, Jun 23, 2016, 07:07 Robert Bradshaw wrote: > +1, I think it makes more sense to name the application of a transform > rather than the transform itself. (Still mulling on how best to do > this with Python...) > > On Wed, Jun 22, 2016 at 9:27 PM, Jean-Baptiste Onofré

Re: [DISCUSS] Beam data plane serialization tech

2016-06-17 Thread Aljoscha Krettek
Hi, am I correct in assuming that the transmitted envelopes would mostly contain coder-serialized values? If so, wouldn't the header of an envelope just be the number of contained bytes and number of values? I'm probably missing something but with these assumptions I don't see the benefit of using

Re: [NOTICE] Change on Filter

2016-06-17 Thread Aljoscha Krettek
There has been an issue about this for a while now: https://issues.apache.org/jira/browse/BEAM-234 On Fri, 17 Jun 2016 at 09:55 Jean-Baptiste Onofré wrote: > Hi Ismaël, > > I didn't talk a change between Dataflow SDK and Beam, I'm talking about > a change between two Beam SNAPSHOTs ;) > > For th

Re: Testing and the Capability Matrix

2016-06-14 Thread Aljoscha Krettek
@Thomas Completely agree, this is also how it is currently handled in the Flink runner. I was talking about the presentation of the compatibility matrix on the web site, whether we should have separate columns for Flink Stream/Batch and Spark Stream/Batch. (And possibly other runners in the future)

Re: Testing and the Capability Matrix

2016-06-11 Thread Aljoscha Krettek
2016 at 07:48 Aljoscha Krettek wrote: > Hi, > this looks very good! One thing I noticed is that there is no WriteToSink. > Right now, there are no tests in the RunnableOnService category that test > the Sink API but we might be able to add some. (Sinks are implemented as a > coup

Re: Testing and the Capability Matrix

2016-06-10 Thread Aljoscha Krettek
Hi, this looks very good! One thing I noticed is that there is no WriteToSink. Right now, there are no tests in the RunnableOnService category that test the Sink API but we might be able to add some. (Sinks are implemented as a couple of DoFns, so it's not strictly necessary but it might still be g

Re: [VOTE] Release version 0.1.0-incubating

2016-06-09 Thread Aljoscha Krettek
+1 (binding) I ran "mvn clean verify" on the source package, executed WordCount using the FlinkPipelineRunner. NOTICE, LICENSE and DISCLAIMER also look good On Thu, 9 Jun 2016 at 18:50 Dan Halperin wrote: > +1 (binding) > > per checklist 2.1, I decompressed the source-release zip and ensured th

Re: DoFn Reuse

2016-06-08 Thread Aljoscha Krettek
Ahh, what we could do is artificially induce bundles using either count or processing time or both. Just so that finishBundle() is called once in a while. On Wed, 8 Jun 2016 at 17:12 Aljoscha Krettek wrote: > Pretty sure, yes. The Iterable in a MapPartitionFunction should give you > a

Re: DoFn Reuse

2016-06-08 Thread Aljoscha Krettek
in the pitfalls of doing this. > - Bobby > > On Wednesday, June 8, 2016 4:24 AM, Aljoscha Krettek < > aljos...@apache.org> wrote: > > > Hi, > a quick related question: In the Flink runner we basically see everything > as one big bundle, i.e. we call startBundle

Re: DoFn Reuse

2016-06-08 Thread Aljoscha Krettek
Hi, a quick related question: In the Flink runner we basically see everything as one big bundle, i.e. we call startBundle() once at the beginning and then keep processing indefinitely, never calling finishBundle(). Is this also correct behavior? Best, Aljoscha On Tue, 7 Jun 2016 at 20:44 Thomas G

Re: 0.1.0-incubating release

2016-06-07 Thread Aljoscha Krettek
By the way, is there any document where we keep track of what checks to run for a release? Maybe I missed something, there. On Tue, 7 Jun 2016 at 21:29 Jean-Baptiste Onofré wrote: > Just submitted: https://github.com/apache/incubator-beam/pull/428 > > to fix the src distribution content. > > Reg

Re: [PROPOSAL] Writing More Expressive Beam Tests

2016-05-24 Thread Aljoscha Krettek
ooks to > allow users to observe this completed state, or the runner to notice that > all PTransforms have completed and shut down the pipeline. Notably, this > notion of completion is simpler than quiescence (as it only requires access > to the watermarks of the system), so runners can im

Re: [DISCUSS] Developing new components -- branches, maturity, and committers

2016-05-19 Thread Aljoscha Krettek
> the main repository for new major components. > > On Thu, May 19, 2016 at 3:09 AM, Ismaël Mejía wrote: > > > I agree with Aljoscha, about not putting the feature branches in the main > > repo, however how can we make people aware of the new developments ? > > > &

Re: [DISCUSS] Developing new components -- branches, maturity, and committers

2016-05-19 Thread Aljoscha Krettek
+1 When we say feature branch, are we talking about a branch in the main repo? I would propose that feature branches live in the repos of the committers who are working on a feature. On Thu, 19 May 2016 at 11:54 Jean-Baptiste Onofré wrote: > +1 > > it looks good to me. > > Regards > JB > > On 0

Re: Dynamic work rebalancing for Beam

2016-05-19 Thread Aljoscha Krettek
Interesting read, thanks for the link! On Thu, 19 May 2016 at 07:09 Dan Halperin wrote: > Hey folks, > > This morning, my colleagues Eugene & Malo posted *No shard left behind: > dynamic work rebalancing in Google Cloud Dataflow > < > https://cloud.google.com/blog/big-data/2016/05/no-shard-left-

Failing Jenkins Runs

2016-05-19 Thread Aljoscha Krettek
Hi, on all of the recent PRs Jenkins fails with this message: https://builds.apache.org/job/beam_PreCommit_MavenVerify/1213/console Does anyone have an idea what might be going on? Also, where is Jenkins configured? With this I could take a look myself. -Aljoscha

Re: [PROPOSAL] Writing More Expressive Beam Tests

2016-05-16 Thread Aljoscha Krettek
Hi, sorry for resurrecting such an old thread but are there already thoughts on how the quiescence handling will work for runner-independent tests? I was thinking about how to make the RunnableOnService tests work when executed in "true-streaming" mode, i.e. when the job would normally never finis

Re: Using Side Inputs to Join with Static Data Sets

2016-05-14 Thread Aljoscha Krettek
watermark trigger, it will fire once when all the data has been processed > > along the side input path since the watermark will go from negative > > infinity to positive infinity. This is the canonical way of how to load a > > static dataset to use as a side input for streaming. G

  1   2   >