Hi Spark Community members ! I want to do several ( from 1 to 10) aggregate functions using window functions on something like 100 columns.
Instead of doing several pass on the data to compute each aggregate function, is there a way to do this efficiently ? Currently it seems that doing val tw = Window .orderBy("date") .partitionBy("id") .rangeBetween(-8035200000L, 0) and then x .withColumn("agg1", max("col").over(tw)) .withColumn("agg2", min("col").over(tw)) .withColumn("aggX", avg("col").over(tw)) Is not really efficient :/ It seems that it iterates on the whole column for each aggregation ? Am I right ? Is there a way to compute all the required operations on a columns with a single pass ? Event better, to compute all the required operations on ALL columns with a single pass ? Thx for your Future[Answers] Julien -- Julien CHAMP — Data Scientist *Web : **www.tellmeplus.com* <http://tellmeplus.com/> — *Email : **jch...@tellmeplus.com <jch...@tellmeplus.com>* *Phone ** : **06 89 35 01 89 <0689350189> * — *LinkedIn* : *here* <https://www.linkedin.com/in/julienchamp> TellMePlus S.A — Predictive Objects *Paris* : 7 rue des Pommerots, 78400 Chatou *Montpellier* : 51 impasse des églantiers, 34980 St Clément de Rivière -- Ce message peut contenir des informations confidentielles ou couvertes par le secret professionnel, à l’intention de son destinataire. Si vous n’en êtes pas le destinataire, merci de contacter l’expéditeur et d’en supprimer toute copie. This email may contain confidential and/or privileged information for the intended recipient. If you are not the intended recipient, please contact the sender and delete all copies. -- <http://www.tellmeplus.com/assets/emailing/banner.html>