Re: Flink: Lost pane timing at some steps of pipeline

2020-05-04 Thread David Morávek
Hi Jozef, I think this is expected beahior as Flink does not use default expansion for Reshuffle (uses round-robin rebalance ship strategy instead). There is no aggregation that needs buffering (and triggering). All of the elements are immediately emmited to downstream operations after the

Re: GSoC 2020: Congratulations, your proposal with The Apache Software Foundation has been accepted!

2020-05-04 Thread Kai Jiang
Congratulations! On Mon, May 4, 2020 at 8:07 PM John Mora wrote: > Hi all. > > My proposal for GSoC was accepted, so this summer I will be working with > you guys in the aggregation analytics functionality of Beam. Thanks so much > for your support during the application period, specially to my

Fwd: GSoC 2020: Congratulations, your proposal with The Apache Software Foundation has been accepted!

2020-05-04 Thread John Mora
Hi all. My proposal for GSoC was accepted, so this summer I will be working with you guys in the aggregation analytics functionality of Beam. Thanks so much for your support during the application period, specially to my mentor Rui Wang. Please let me know if you have suggestions or ideas for my

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-04 Thread Ahmet Altay
On Mon, May 4, 2020 at 6:58 PM Aizhamal Nurmamat kyzy wrote: > Thanks everyone for your feedback and support with the review. Please add > any other comments so we can address them soon, if not please share your > LGTMs. > > @Robert, thanks for separating the PR! > > @Thomas, regarding your

Re: [DISCUSS] How many Python 3.x minor versions should Beam Python SDK aim to support concurrently?

2020-05-04 Thread Yoshiki Obata
Thank you for comment, Valentyn. > 1) We can seed the smoke test suite with typehints tests, and add more tests > later if there is a need. We can identify them by the file path or by special > attributes in test files. Identifying them using filepath seems simpler and > independent of test

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-04 Thread Aizhamal Nurmamat kyzy
Thanks everyone for your feedback and support with the review. Please add any other comments so we can address them soon, if not please share your LGTMs. @Robert, thanks for separating the PR! @Thomas, regarding your question "There are some changes missing though (for example [2]), are you

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-04 Thread Robert Bradshaw
I took the massive commit and split it up into: (1) Infrastructure changes (basically everything outside of (website/www/site/content) (2) Sed script changes, and (3) Manual changes (everything not in (1) and (2)). It does seem that (3) has a number of unintentional changes, some stylistic (e.g.

Builtin IOs - Link to Java/Pydoc instead of code?

2020-05-04 Thread Pablo Estrada
Hi all, I just noted that in our Built-in IOs page[1], we tend to link to the code for the IOs that we mention. I think it would be better to link to the Javadoc or the Pydoc for those IOs instead. Thoughts? Best -P. [1] https://beam.apache.org/documentation/io/built-in/

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-04 Thread Robert Bradshaw
On Mon, May 4, 2020 at 6:02 PM Thomas Weise wrote: > > I took a brief look at [1] and looks good overall. > > There are some changes missing though (for example [2]), are you planning to > add more recent commits later? > > Also, there was an earlier question from Brian regarding the possibility

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-04 Thread Thomas Weise
I took a brief look at [1] and looks good overall. There are some changes missing though (for example [2]), are you planning to add more recent commits later? Also, there was an earlier question from Brian regarding the possibility to retain the post dates in blog file names. I would second

Re: [Proposal] Apache Beam Fn API - Histogram Style Metrics (Correct link this time)

2020-05-04 Thread Alex Amato
Thanks Ismaël :). Done On Mon, May 4, 2020 at 3:59 PM Ismaël Mejía wrote: > Moving the short link to this thread > https://s.apache.org/beam-histogram-metrics > > Alex can you add this link (and any other of your documents that may > not be there) to >

Re: [Proposal] Apache Beam Fn API - Histogram Style Metrics (Correct link this time)

2020-05-04 Thread Ismaël Mejía
Moving the short link to this thread https://s.apache.org/beam-histogram-metrics Alex can you add this link (and any other of your documents that may not be there) to https://cwiki.apache.org/confluence/display/BEAM/Design+Documents On Tue, May 5, 2020 at 12:51 AM Pablo Estrada wrote: > > FYI

Re: [Proposal] Apache Beam Fn API - Histogram Style Metrics (Correct link this time)

2020-05-04 Thread Pablo Estrada
FYI +Boyuan Zhang worked on implementing a histogram metric that was performance-optimized into outer space for Python : ) - I don't recall if she ended up getting it merged, but it's worth looking at the work. I also remember Scott Wegner wrote the metrics for Java. Best -P. On Mon, May 4,

[Proposal] Apache Beam Fn API - Histogram Style Metrics (Correct link this time)

2020-05-04 Thread Alex Amato
Hello, I have created a proposal for Apache Beam FN API to support Histogram Style Metrics . Which defines a method to collect Histogram style metrics and pass them over the FN API. I would love to hear your

Re: [Proposal] Apache Beam Fn API - Histogram Style Metrics

2020-05-04 Thread Alex Amato
Sorry, wrong link. Let's close this thread and I'll send another... On Mon, May 4, 2020 at 3:28 PM Pablo Estrada wrote: > Hi Alex! > Thanks for the proposal. I've created > https://s.apache.org/beam-histogram-metrics > > On Mon, May 4, 2020 at 2:44 PM Alex Amato wrote: > >> Hello, >> >> I have

Re: [Proposal] Apache Beam Fn API - Histogram Style Metrics

2020-05-04 Thread Pablo Estrada
Hi Alex! Thanks for the proposal. I've created https://s.apache.org/beam-histogram-metrics On Mon, May 4, 2020 at 2:44 PM Alex Amato wrote: > Hello, > > I have created a proposal for Apache Beam FN API to support Histogram > Style Metrics >

[Proposal] Apache Beam Fn API - Histogram Style Metrics

2020-05-04 Thread Alex Amato
Hello, I have created a proposal for Apache Beam FN API to support Histogram Style Metrics . Which defines a method to collect Histogram style metrics and pass them over the FN API. Also,

Apache Beam application to Season of Docs 2020

2020-05-04 Thread Aizhamal Nurmamat kyzy
Hi all, I have submitted the application to the Season of Docs program with the project ideas we have developed last year [1]. I learnt about its deadline a few hours ago and didn't want to miss it. Feel free to add more project ideas (or edit the current ones) until May 7th. If Beam gets

Re: [DISCUSS] finishBundle once per window

2020-05-04 Thread Reuven Lax
I assume you are referring to elements output from finishBundle. The problem is that the current window is an input to WindowFn.assignWindows. The new window can depend on the timestamp, the element itself, and the original window. I'm not sure how many users rely on this, however it has been

Re: [PROPOSAL] Preparing for Beam 2.22.0 release

2020-05-04 Thread Luke Cwik
Thanks Brian. On Mon, May 4, 2020 at 10:57 AM Kyle Weaver wrote: > Thanks Brian! > > I'd also like to remind everyone to update CHANGES.md with any important > recent changes that might be missing. > https://github.com/apache/beam/blob/master/CHANGES.md > > On Mon, May 4, 2020 at 1:25 PM Brian

Re: Exploding windows and FnApiDoFnRunner

2020-05-04 Thread Luke Cwik
Kenn, the optimization is not complex, just never done. The FnApiDoFnRunner was rewritten to be designed with portability first and to move away from the assumptions that were baked into the existing DoFn "runner" implementations and the constructs used in the non-portable implementation. There

Re: [DISCUSS] finishBundle once per window

2020-05-04 Thread Jan Lukavský
There was a mention in some other thread, that in order to make user experience as predictable as possible, we should try to make windows idempotent, and once window is assigned, it should be never changed (and timestamp move outside of the scope of window, unless a different windowfn is

Re: [DISCUSS] finishBundle once per window

2020-05-04 Thread Reuven Lax
This should not affect the ability of the user to specify the output timestamp. Today FinishBundleContext.output forces you to specify the window as well as the timestamp, which is a bit awkward. (I believe that it also lets you create brand new windows in finishBundle, which is interesting, but

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-04 Thread Hannah Jiang
Hi Aizhamal,, yes, Wednesday sounds good to me. Thank you. On Mon, May 4, 2020 at 10:40 AM Aizhamal Nurmamat kyzy wrote: > Hannah, > > We don't have an exact date, but we are hoping to address all the comments > and merge the PR by Wednesday. Will it be possible for you to wait until > then? >

Re: [PROPOSAL] Preparing for Beam 2.22.0 release

2020-05-04 Thread Kyle Weaver
Thanks Brian! I'd also like to remind everyone to update CHANGES.md with any important recent changes that might be missing. https://github.com/apache/beam/blob/master/CHANGES.md On Mon, May 4, 2020 at 1:25 PM Brian Hulette wrote: > Hi all, > > The next (2.22.0) release branch cut is scheduled

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-04 Thread Aizhamal Nurmamat kyzy
Hannah, We don't have an exact date, but we are hoping to address all the comments and merge the PR by Wednesday. Will it be possible for you to wait until then? On Thu, Apr 30, 2020 at 4:29 PM Hannah Jiang wrote: > Since we want to move forward with the PR, I would like to ask the >>

Re: Exploding windows and FnApiDoFnRunner

2020-05-04 Thread Robert Burke
Ack ok. Thank you for clarifying! Confirming that Kenn is right, the optimization is pretty much that simple. [1] is where it's done in the Go SDK [1] https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/pardo.go#L136 On Mon, May 4, 2020, 10:18 AM Reuven Lax wrote: >

Re: Flink Runner with RequiresStableInput fails after a certain number of checkpoints

2020-05-04 Thread Eleanore Jin
Hi Max, Thanks for the information and I saw this PR is already merged, just wonder is it backported to the affected versions already (i.e. 2.14.0, 2.15.0, 2.16.0, 2.17.0, 2.18.0, 2.19.0, 2.20.0)? Or I have to wait for the 2.20.1 release? Thanks a lot! Eleanore On Wed, Apr 22, 2020 at 2:31 AM

Re: [DISCUSS] finishBundle once per window

2020-05-04 Thread Robert Bradshaw
This is a really nice idea. Would the user still need to specify the timestamp of the output? I'm a bit ambivalent about calling it multiple times if OuptutReceiver alone is in the parameter list; this might not be obvious and could be surprising behavior. On Mon, May 4, 2020 at 10:13 AM Reuven

[PROPOSAL] Preparing for Beam 2.22.0 release

2020-05-04 Thread Brian Hulette
Hi all, The next (2.22.0) release branch cut is scheduled for May 20, according to the calendar . I would like to volunteer myself to do this release. The plan is to cut the branch on that

Re: Exploding windows and FnApiDoFnRunner

2020-05-04 Thread Reuven Lax
I wonder how often we even implement this optimization today. If the processElement has an OutputReceiver parameter then we mark it as observesWindow, and that's a pretty common parameter. Arguably this is a bug in our implementation of OutputReceiver though - it should be able to copy all the

[DISCUSS] finishBundle once per window

2020-05-04 Thread Reuven Lax
I would like to discuss a minor extension to the Beam model. Beam bundles have very few restrictions on what can be in a bundle, in particular s bundle might contain records for many different windows. This was an explicit decision as bundling primarily exists for performance reasons and we found

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-04 Thread Nam Bui
Hey Kenn. Thanks so much for your useful information and research. It's great to know. On Mon, May 4, 2020 at 6:33 PM Kenneth Knowles wrote: > Regarding the detection of renames, I now recall that I have encountered > this before: it is controlled by the config diff.renameLimit. The default >

Re: Exploding windows and FnApiDoFnRunner

2020-05-04 Thread Kenneth Knowles
Is the optimization complex in the Fn API context? In non-Fn API it is basically "if (observesWindow) { explode } else { don't }" [1]. The DoFn signature tells you everything you need. This might be a good first commit for someone looking to contribute to the Java SDK harness? Kenn [1]

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-04 Thread Kenneth Knowles
Regarding the detection of renames, I now recall that I have encountered this before: it is controlled by the config diff.renameLimit. The default value these days is high enough to work for this PR. I've confirmed this: git diff --shortstat $(git merge-base github/pr/11554 github/master)

Re: Exploding windows and FnApiDoFnRunner

2020-05-04 Thread Robert Bradshaw
In Python we only explode windows if the Window is being inspected. (There is no separate "DoFnRunner" for FnApi vs. Legacy execution.) On Mon, May 4, 2020 at 9:21 AM Luke Cwik wrote: > > Reuven you are correct that the optimization has yet to be implemented. > Robert the FnApiDoFnRunner is the

Re: Exploding windows and FnApiDoFnRunner

2020-05-04 Thread Luke Cwik
Reuven you are correct that the optimization has yet to be implemented. Robert the FnApiDoFnRunner is the name of a Java class that executes Java DoFns in the Java SDK harness. The poor name choice is my fault. On Fri, May 1, 2020 at 9:14 PM Reuven Lax wrote: > FnApiDoFnRunner does run Java

Re: Non-trivial joins examples

2020-05-04 Thread Marcin Kuthan
@Kenneth - thank for your response, for sure I was inspired a lot by earlier discussions on the group and latest documentation updates about Timers: https://beam.apache.org/documentation/programming-guide/#timers In the limitations I forgot to mention about SideInputs, it works quite well for

Re: Jenkins jobs not running for my PR 10438

2020-05-04 Thread Robert Bradshaw
Done. On Mon, May 4, 2020 at 7:35 AM Rehman Murad Ali wrote: > > Hi Beam committers, > > Would you please trigger the basic checks as well as validatesRunner check > for this PR? > https://github.com/apache/beam/pull/11350 > > > Thanks & Regards > > Rehman Murad Ali > Software Engineer >

Re: Jenkins jobs not running for my PR 10438

2020-05-04 Thread Rehman Murad Ali
Hi Beam committers, Would you please trigger the basic checks as well as validatesRunner check for this PR? https://github.com/apache/beam/pull/11350 *Thanks & Regards* *Rehman Murad Ali* Software Engineer Mobile: +92 3452076766 Skype: rehman.muradali On Fri, May 1, 2020 at 5:11 PM Ismaël

Flink: Lost pane timing at some steps of pipeline

2020-05-04 Thread Jozef Vilcek
I have a pipeline which 1. Read from KafkaIO 2. Does stuff with events and writes windowed file via FileIO 3. Apply statefull DoFn on written files info The statefull DoFn does some logic which depends on PaneInfo.Timing, if it is EARLY or something else. When testing in DirectRunner, all is

Beam Dependency Check Report (2020-05-04)

2020-05-04 Thread Apache Jenkins Server
ERROR: File 'src/build/dependencyUpdates/beam-dependency-check-report.html' does not exist

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-04 Thread Nam Bui
Hey guys, How was your weekend? Thanks for some of the compliments and also recommendations. About the commits, as Brian said, we worked together on the-asf slack. It was the tough one, we even did a few experiments. And finally came up with a solution that preserved all commits and used `git