you can create DF from your SQL RS and work with that in Python the way you want
## you don't need all these import findspark findspark.init() from pyspark.sql import SparkSession from pyspark import SparkContext from pyspark.sql import SQLContext from pyspark.sql.functions import udf, col, current_timestamp, lit from pyspark.sql.types import * sqltext = """ SELECT aggregate(array(1, 2, 3, 4), named_struct('sum', 0, 'cnt', 0), (acc, x) -> named_struct('sum', acc.sum + x, 'cnt', acc.cnt + 1), acc -> acc.sum / acc.cnt) AS avg """ df = spark.sql(sqltext) df.printSchema() root |-- avg: double (nullable = true) Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Fri, 5 May 2023 at 20:33, Yong Zhang <java8...@hotmail.com> wrote: > Hi, This is on Spark 3.1 environment. > > For some reason, I can ONLY do this in Spark SQL, instead of either Scala > or PySpark environment. > > I want to aggregate an array into a Map of element count, within that > array, but in Spark SQL. > I know that there is an aggregate function available like > > aggregate(expr, start, merge [, finish]) > > > But I want to know if this can be done in the Spark SQL only, and: > > - How to represent an empty Map as "start" element above > - How to merge each element (as String type) into Map (as adding count > if exist in the Map, or add as (element -> 1) as new entry in the Map if > not exist) > > Like the following example -> > https://docs.databricks.com/sql/language-manual/functions/aggregate.html > > SELECT aggregate(array(1, 2, 3, 4), named_struct('sum', 0, > 'cnt', 0), (acc, x) -> named_struct('sum', acc.sum + x, > 'cnt', acc.cnt + 1), acc -> acc.sum / acc.cnt) AS avg > > > I wonder: > select > aggregate( > array('a','b','a')), > map('', 0), > (acc, x) -> ??? > acc -> acc) as output > > How to do the logic after "(acc, x) -> ", so I can output a map of count > of each element in the array? > I know I can "explode", then groupby + count, but since I have multi array > columns need to transform, so I want to do more a high order function way, > and in pure Spark SQL. > > Thanks > >