Re: Why there is no top method in dataset api

Sean Owen Thu, 01 Sep 2016 05:32:58 -0700

You can always call .rdd.top(n) of course. Although it's slightly
clunky, you can also .orderBy($"value".desc).take(n). Maybe there's an
easier way.

I don't think if there's a strong reason other than it wasn't worth it
to write this and many other utility wrappers that a) already exist on
the underlying RDD API if you want them, and b) have a DataFrame-like
counterpart already that doesn't really need wrapping in a different
API.

On Thu, Sep 1, 2016 at 12:53 PM, Jakub Dubovsky
<spark.dubovsky.ja...@gmail.com> wrote:
> Hey all,
>
> in RDD api there is very usefull method called top. It finds top n records
> in according to certain ordering without sorting all records. Very usefull!
>
> There is no top method nor similar functionality in Dataset api. Has anybody
> any clue why? Is there any specific reason for this?
>
> Any thoughts?
>
> thanks
>
> Jakub D.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Why there is no top method in dataset api

Reply via email to