Re: [DISCUSS] Autoformat python code with Black

2019-05-30 Thread Łukasz Gajowy
+1 for any autoformatter for Python SDK that does the job. My experience is that since spotless in Java SDK I would never start a new Java project without it. So many great benefits not only for one person coding but for all community. It is a GitHub UI issue that you cannot easily browse past the

Re: Shuffling on apache beam

2019-05-30 Thread pasquale . bonito
This was my first option but I'm using google dataflow as runner and it's not clear if it supports stateful DoFn. However my problem is latency, I've been trying different solution but it seems difficult to bring latency under 1s when consuming message (150/s )from PubSub with beam/dataflow. Is

Re: Timer support in Flink

2019-05-30 Thread Reza Rokni
PS, until it was just pointed out to me by Max, I had missed the (expand details) clickable link in the capability matrix. Probably just me, but do others think it's also easy to miss? If yes I will raise a Jira for it On Wed, 29 May 2019 at 19:52, Reza Rokni wrote: > Thanx Max! > > Reza >

Re: Shuffling on apache beam

2019-05-30 Thread Reza Rokni
Hi, Would you mind sharing your latency requirements? For example is it < 1 sec at XX percentile? With regards to Stateful DoFn with a few exceptions it is supported : https://beam.apache.org/documentation/runners/capability-matrix/#cap-full-what Cheers Reza On Thu, 30 May 2019 at 18:08, pa

Re: Timer support in Flink

2019-05-30 Thread Alex Van Boxel
Oh... you can expand the matrix. Never saw that, this could indeed be better. So it isn't you. _/ _/ Alex Van Boxel On Thu, May 30, 2019 at 12:24 PM Reza Rokni wrote: > PS, until it was just pointed out to me by Max, I had missed the (expand > details) clickable link in the capability matrix.

Re: Timer support in Flink

2019-05-30 Thread Reza Rokni
:-) https://issues.apache.org/jira/browse/BEAM-7456 On Thu, 30 May 2019 at 18:41, Alex Van Boxel wrote: > Oh... you can expand the matrix. Never saw that, this could indeed be > better. So it isn't you. > > _/ > _/ Alex Van Boxel > > > On Thu, May 30, 2019 at 12:24 PM Reza Rokni wrote: > >> P

Re: Definition of Unified model

2019-05-30 Thread Reuven Lax
Files can grow (depending on the filesystem), and tailing growing files is a valid use case. On Wed, May 29, 2019 at 3:23 PM Jan Lukavský wrote: > > Offsets within a file, unordered between files seems exactly > analogous with offsets within a partition, unordered between partitions, > right? >

Re: Shuffling on apache beam

2019-05-30 Thread Reuven Lax
How are you measuring latency? On Thu, May 30, 2019 at 3:08 AM pasquale.bon...@gmail.com < pasquale.bon...@gmail.com> wrote: > This was my first option but I'm using google dataflow as runner and it's > not clear if it supports stateful DoFn. > However my problem is latency, I've been trying diff

Re: Definition of Unified model

2019-05-30 Thread Jan Lukavský
That's right, but is there a filesystem, that allows unbounded size of files? If there will always be an upper size limit, does that mean that you cannot use the order of elements in the file as is? You might need to transfer the offset from one file to another (that's how Kafka does it), but t

Re: Shuffling on apache beam

2019-05-30 Thread pasquale . bonito
I'm measuring latency as the difference between the timestamp of the column on BigTable and the one I associate to the message when I publish it to the topic. I also do intermediate measurement after the message is read from PubSub topic and before inserting into BigTable. All timestamps are wri

Re: **Request to add me as a contributor.**

2019-05-30 Thread Lukasz Cwik
Welcome, I have added you as a contributor and assigned BEAM-7442 to you. On Wed, May 29, 2019 at 9:09 PM Akshay Iyangar wrote: > Hello everyone, > > > > My name is Akshay Iyangar, using beam repo extensively. There is a small > patch that I would like to push through upstream. > https://issues.

Re: Shuffling on apache beam

2019-05-30 Thread Reuven Lax
Do you have any way of knowing how much of this time is being spent in Pub/Sub and how much in the Beam pipeline? If you are using the Dataflow runner and doing any shuffling, 100-150ms is currently not attainable. Writes to shuffle are batched for up to 100ms at a time to keep operational costs d

Re: Shuffling on apache beam

2019-05-30 Thread pasquale . bonito
Ideally my pipeline requires no shuffling, I just saw that introducing a windowing operation improves performance of BigTable insert. I don't know how to measure time spent in PubSub. I took the time when I message is published, fill the timestamp metadata and than confront that value with the t

Beam Summit volunteering team

2019-05-30 Thread Matthias Baetens
Hi everyone, As you might know, the Beam Summit is currently organised by a small team of people committing their free time to make this happen. We are looking to make this group larger in the future and particularly could use some help on website maintenance, develo

Support for PaneInfo in Python SDK

2019-05-30 Thread Tanay Tummalapalli
Hi everyone, The PR linked in [BEAM-3759] - "Add support for PaneInfo descriptor in Python SDK"[1] was merged, but, the issue is still open. There might be some work left on this for full support for PaneInfo. Eg: Although the PaneInfo class exists, it is not accessible in a DoFn via a kwarg(PaneI

Re: Support for PaneInfo in Python SDK

2019-05-30 Thread Pablo Estrada
Hi Tanay, thanks for bringing this to the mailing list. I believe this is certainly useful, and necessary. As an example, the fileio.WriteToFiles transform does not work well without PaneInfo data (since we can't know how many firings there are for each window, and we can't give names to files base

Re: DISCUSS: Sorted MapState API

2019-05-30 Thread Kenneth Knowles
On Tue, May 28, 2019 at 2:59 AM Robert Bradshaw wrote: > On Fri, May 24, 2019 at 6:57 PM Kenneth Knowles wrote: > > > > On Fri, May 24, 2019 at 9:51 AM Kenneth Knowles wrote: > >> > >> On Fri, May 24, 2019 at 8:14 AM Reuven Lax wrote: > >>> > >>> Some great comments! > >>> > >>> Aljoscha: abso

[VOTE] Release 2.13.0, release candidate #2

2019-05-30 Thread Ankur Goenka
Hi everyone, Please review and vote on the release candidate #2 for the version 2.13.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes [1],