Do you mean that you are trying to compute the percent rank of some data? You can use the SparkSQL percent_rank function for that, but I don't think that's going to give you any improvement over calling the percentRank function on the data frame. Are you currently using a user-defined function for this task? Because I bet that's what's slowing you down.
On Mon, Nov 11, 2019 at 9:46 AM Tzahi File <tzahi.f...@ironsrc.com> wrote: > Hi, > > Currently, I'm using hive huge cluster(m5.24xl * 40 workers) to run a > percentile function. I'm trying to improve this job by moving it to run > with spark SQL. > > Any suggestions on how to use a percentile function in Spark? > > > Thanks, > -- > Tzahi File > Data Engineer > [image: ironSource] <http://www.ironsrc.com/> > > email tzahi.f...@ironsrc.com > mobile +972-546864835 > fax +972-77-5448273 > ironSource HQ - 121 Derech Menachem Begin st. Tel Aviv > ironsrc.com <http://www.ironsrc.com/> > [image: linkedin] <https://www.linkedin.com/company/ironsource>[image: > twitter] <https://twitter.com/ironsource>[image: facebook] > <https://www.facebook.com/ironSource>[image: googleplus] > <https://plus.google.com/+ironsrc> > This email (including any attachments) is for the sole use of the intended > recipient and may contain confidential information which may be protected > by legal privilege. If you are not the intended recipient, or the employee > or agent responsible for delivering it to the intended recipient, you are > hereby notified that any use, dissemination, distribution or copying of > this communication and/or its content is strictly prohibited. If you are > not the intended recipient, please immediately notify us by reply email or > by telephone, delete this email and destroy any copies. Thank you. > -- http://www.google.com/profiles/grapesmoker