Nulls are excluded with spark.sql("SELECT count(distinct col) FROM Table").show() I think it is ANSI SQL behaviour.
scala> spark.sql("select distinct count(null)").show(false) +-----------+ |count(NULL)| +-----------+ |0 | +-----------+ scala> spark.sql("select distinct null").count res1: Long = 1 Regards, Hemanth From: Mohamed Nadjib Mami <mohamed.nadjib.m...@gmail.com> Date: Thursday, 6 April 2017 at 20.29 To: "user@spark.apache.org" <user@spark.apache.org> Subject: df.count() returns one more count than SELECT COUNT() spark.sql("SELECT count(distinct col) FROM Table").show()