Asked about this a few weeks ago, I rebased from master as was suggested,
but I am still seeing these. I am guessing this is wasting our resources
somehow? :(
On Tue, Aug 18, 2020 at 7:28 PM Alex Amato wrote:
> Run failed for master (010adc5)
>
> Repository: ajamato/beam
> Workflow: Build
Hi Val,
Thank you for your response. I like the idea of reactive event based
processing engine for fault tolerance. As you mentioned it will be upto
underlying system to manage job execution and offer fault tolerance and we
will need to build it in Ignite compute execution model.
I looked into
Hi Saikat,
Thanks for clarifying. Is there a Beam component that monitors the state,
or this is up to the application? If something fails, will the application
have to retry the whole pipeline?
My concern is that Ignite compute actually provides very limited
guarantees, especially for the async
Thank you Robert!
On Mon, Aug 17, 2020 at 5:52 PM Robert Bradshaw wrote:
> Correct, everything is per-key. To allow triggering after n events you
> would have to given them all the same key. (Note that this would
> potentially introduce a bottleneck, as they would all be shuffled to
> the same
Hi Luke
Will take a look at this as soon as possible and get back to you.
Best Regards,
Pulasthi
On Tue, Aug 18, 2020 at 2:30 PM Luke Cwik wrote:
> I have made some good progress here and have gotten to the following state
> for non-portable runners:
>
> DirectRunner[1]: Merged. Supports
getPMForCDF[1] seems to return a CDF and you can choose the split points
(b0, b1, b2, ...).
1:
https://github.com/stanford-futuredata/msketch/blob/cf4e49e860761f48ebdeb00f650ce997c46073e2/javamsketch/quantilebench/src/main/java/yahoo/DoublesPmfCdfImpl.java#L16
On Tue, Aug 18, 2020 at 11:20 AM
I have made some good progress here and have gotten to the following state
for non-portable runners:
DirectRunner[1]: Merged. Supports Read.Bounded and Read.Unbounded.
Twister2[2]: Ready for review. Supports Read.Bounded, the current runner
doesn't support unbounded pipelines.
Spark[3]: WIP.
I'm a bit confused, are you sure that it is possible to derive the CDF?
Using the moments variables.
The linked implementation on github seems to not use a derived CDF
equation, but instead using some sampling technique (which I can't fully
grasp yet) to estimate how many elements are in each
Hi Alex,
It is great to know you are working on the metrics. Do you have any concern if
we add a Histogram type metrics in Samza Runner itself for now so we can start
using it before a generic histogram metrics can be introduced in the Metrics
class?
Best,
Ke
> On Aug 18, 2020, at 12:57 AM,
Hi Alex,
I'm not sure about restoring histogram, because the use-case I had in the
past used percentiles. As I understand it, you can approximate histogram if
you know percentiles and total count. E.g. 5% of values fall into
[P95, +INF) bucket, other 5% [P90, P95), etc. I don't understand the
You can use a cumulative distribution function over the sketch at b0, b1,
b2, b3, ... which will tell you the probability that any given value is <=
X. You multiply that probability against the total count (which is also
recorded as part of the sketch) to get an estimate for the number of values
Thank you. Looking forward to working with the Beam community. :)
On Mon, Aug 17, 2020 at 11:57 PM Pablo Estrada wrote:
> Welcome Sruthi! : )
>
> On Mon, Aug 17, 2020 at 2:41 PM Gris Cuevas wrote:
>
>> Welcome Sruthi!
>>
>> On 2020/08/17 20:56:40, Aizhamal Nurmamat kyzy
>> wrote:
>> > Hi all,
I looked at the possibility to fix the underlying filesystem and it turns
out that only the local filesystem couldn't handle decoding right, HDFS and
some other filesystem, e.g. S3, already have a check for that.
So I added a similar check to the local filesystem too. The implementation
is in the
13 matches
Mail list logo