Hi Michael, Thanks for response. I did test with query that you send me. And it works really faster: Old queries stats by phases: 3.2min 17s Your query stats by phases: 0.3 s 16 s 20 s
But will this improvement also affect when you want to count distinct on 2 or more fields: SELECT COUNT(f1), COUNT(DISTINCT f2), COUNT(DISTINCT f3), COUNT(DISTINCT f4) FROM parquetFile Should i still create Jira issue/improvement for this? @Nick That also make sense. But should i just get count of my data to driver node? I just started to learn about Spark(and it is great) so sorry if i ask stupid questions or anything like that. Best regards Bojan -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SQL-COUNT-DISTINCT-tp17818p17939.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org