Hi Do,

Paris and Martha worked on sampling techniques for data streams on Flink
last year. If you want to implement your own samplers, you might find
Martha's master thesis helpful [1].

-Vasia.

[1]: http://kth.diva-portal.org/smash/get/diva2:910695/FULLTEXT01.pdf

On 11 July 2016 at 11:31, Kostas Kloudas <k.klou...@data-artisans.com>
wrote:

> Hi Do,
>
> In DataStream you can always implement your own
> sampling function, hopefully without too much effort.
>
> Adding such functionality it to the API could be a good idea.
> But given that in sampling there is no “one-size-fits-all”
> solution (as not every use case needs random sampling and not
> all random samplers fit to all workloads), I am not sure if we
> should start adding different sampling operators.
>
> Thanks,
> Kostas
>
> > On Jul 9, 2016, at 5:43 PM, Greg Hogan <c...@greghogan.com> wrote:
> >
> > Hi Do,
> >
> > DataSet provides a stable @Public interface. DataSetUtils is marked
> > @PublicEvolving which is intended for public use, has stable behavior,
> but
> > method signatures may change. It's also good to limit DataSet to common
> > methods whereas the utility methods tend to be used for specific
> > applications.
> >
> > I don't have the pulse of streaming but this sounds like a useful feature
> > that could be added.
> >
> > Greg
> >
> > On Sat, Jul 9, 2016 at 10:47 AM, Le Quoc Do <lequo...@gmail.com> wrote:
> >
> >> Hi all,
> >>
> >> I'm working on approximate computing using sampling techniques. I
> >> recognized that Flink supports the sample function for Dataset
> >> (org/apache/flink/api/java/utils/DataSetUtils.java). I'm just wondering
> why
> >> you didn't merge the function to org/apache/flink/api/java/DataSet.java
> >> since the sample function works as a transformation operator?
> >>
> >> The second question is that are you planning to support the sample
> >> function for DataStream (within windows) since I did not see it in
> >> DataStream code ?
> >>
> >> Thank you,
> >> Do
> >>
>
>

Reply via email to