Ideally my pipeline requires no shuffling, I just saw that introducing a
windowing operation improves performance of BigTable insert.
I don't know how to measure time spent in PubSub. I took the time when I
message is published, fill the timestamp metadata and than confront that value
with the t
Do you have any way of knowing how much of this time is being spent in
Pub/Sub and how much in the Beam pipeline?
If you are using the Dataflow runner and doing any shuffling, 100-150ms is
currently not attainable. Writes to shuffle are batched for up to 100ms at
a time to keep operational costs d
I'm measuring latency as the difference between the timestamp of the column on
BigTable and the one I associate to the message when I publish it to the topic.
I also do intermediate measurement after the message is read from PubSub topic
and before inserting into BigTable.
All timestamps are wri
How are you measuring latency?
On Thu, May 30, 2019 at 3:08 AM pasquale.bon...@gmail.com <
pasquale.bon...@gmail.com> wrote:
> This was my first option but I'm using google dataflow as runner and it's
> not clear if it supports stateful DoFn.
> However my problem is latency, I've been trying diff
Hi,
Would you mind sharing your latency requirements? For example is it < 1 sec
at XX percentile?
With regards to Stateful DoFn with a few exceptions it is supported :
https://beam.apache.org/documentation/runners/capability-matrix/#cap-full-what
Cheers
Reza
On Thu, 30 May 2019 at 18:08, pa
This was my first option but I'm using google dataflow as runner and it's not
clear if it supports stateful DoFn.
However my problem is latency, I've been trying different solution but it seems
difficult to bring latency under 1s when consuming message (150/s )from PubSub
with beam/dataflow.
Is
If you add a stateful DoFn to your pipeline, you'll force Beam to shuffle
data to their corresponding worker per key. I am not sure what is the
latency cost of doing this (as the messages still need to be shuffled). But
it may help you accomplish this without adding windowing+triggering.
-P.
On W
Hi Reza,
with GlobalWindow with triggering I was able to reduce hotspot issues gaining
satisfying performance for BigTable update. Unfortunately latency when getting
messages from PubSub remains around 1.5s that it's too much considering our NFR.
This is the code I use to get the messages:
PColl
PS You can also make use of the GlobalWindow with a stateful DoFn.
On Fri, 24 May 2019 at 15:13, Reza Rokni wrote:
> Hi,
>
> Have you explored the use of triggers with your use case?
>
>
> https://beam.apache.org/releases/javadoc/2.12.0/org/apache/beam/sdk/transforms/windowing/Trigger.html
>
> C
Hi,
Have you explored the use of triggers with your use case?
https://beam.apache.org/releases/javadoc/2.12.0/org/apache/beam/sdk/transforms/windowing/Trigger.html
Cheers
Reza
On Fri, 24 May 2019 at 14:14, pasquale.bon...@gmail.com <
pasquale.bon...@gmail.com> wrote:
> Hi Reuven,
> I would li
Hi Reuven,
I would like to know if is possible to guarantee that record are processed by
the same thread/task based on a key, as probably happens in a combine/stateful
operation, without adding the delay of a windows.
This could increase efficiency of caching and reduce same racing condition when
Can you explain what you mean by worker? While every runner has workers of
course, workers are not part of the programming model.
On Thu, May 23, 2019 at 8:13 PM pasquale.bon...@gmail.com <
pasquale.bon...@gmail.com> wrote:
> Hi all,
> I would like to know if Apache Beam has a functionality simil
Hi all,
I would like to know if Apache Beam has a functionality similar to
fieldsGrouping in Storm that allows to send records to a specific task/worker
based on a key.
I know that we can achieve that with a combine/grouByKey operation but that
implies to add a windowing in our pipeline that we
13 matches
Mail list logo