i think you can add executer memory
| | 15313776907 | | 邮箱:15313776...@163.com | 签名由 网易邮箱大师 定制 On 12/11/2018 08:28, lsn24 wrote: Hello, I have a requirement where I need to get total count of rows and total count of failedRows based on a grouping. The code looks like below: myDataset.createOrReplaceTempView("temp_view"); Dataset <Row> countDataset = sparkSession.sql("Select column1,column2,column3,column4,column5,column6,column7,column8, count(*) as totalRows, sum(CASE WHEN (column8 is NULL) THEN 1 ELSE 0 END) as failedRows from temp_view group by column1,column2,column3,column4,column5,column6,column7,column8"); Up till around 50 Million records, the query performance was ok. After that it gave it up. Mostly resulting in out of Memory exception. I read documentation and blogs, most of them gives me examples of RDD.reduceByKey. But here I got dataset and spark Sql. What am I missing here ? . Any help will be appreciated. Thanks! -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org