Hi Till,
I have created the JIRA: https://issues.apache.org/jira/browse/FLINK-4205
Thank you,
Do
On Tue, Jul 12, 2016 at 6:05 PM, Till Rohrmann wrote:
> Stratified sampling would also be beneficial for the DataSet API. I think
> it would be best if this method is also added to DataSetUtils or
Stratified sampling would also be beneficial for the DataSet API. I think
it would be best if this method is also added to DataSetUtils or made
available via the flink-contrib module. Furthermore, I think that it would
be easiest if you created the JIRA for this feature, because you know what
you w
Hey Do,
I think that more sophisticated samplers could make a better fit in the ML
library and not in the core API but I am not very familiar with the milestones
there.
Maybe the maintainers of the batch ML library could check if sampling
techniques could be useful there I guess.
Paris
> On 1
Hi all,
Thank you all for your answers.
By the way, I also recognized that Flink doesn't support "stratified
sampling" function (only simple random sampling) for DataSet.
It would be nice if someone can create a Jira for it, and assign the task
to me so that I can work for it.
Thank you,
Do
On
Hi Do,
Paris and Martha worked on sampling techniques for data streams on Flink
last year. If you want to implement your own samplers, you might find
Martha's master thesis helpful [1].
-Vasia.
[1]: http://kth.diva-portal.org/smash/get/diva2:910695/FULLTEXT01.pdf
On 11 July 2016 at 11:31, Kosta
Hi Do,
In DataStream you can always implement your own
sampling function, hopefully without too much effort.
Adding such functionality it to the API could be a good idea.
But given that in sampling there is no “one-size-fits-all”
solution (as not every use case needs random sampling and not
al
Hi Do,
DataSet provides a stable @Public interface. DataSetUtils is marked
@PublicEvolving which is intended for public use, has stable behavior, but
method signatures may change. It's also good to limit DataSet to common
methods whereas the utility methods tend to be used for specific
application