Hi all, I am having some troubles in doing a count distinct over multiple columns. This is an example of my data: +----+----+----+---+ |a |b |c |d | +----+----+----+---+ |null|null|null|1 | |null|null|null|2 | |null|null|null|3 | |null|null|null|4 | |null|null|null|5 | |null|null|null|6 | |null|null|null|7 | +----+----+----+---+ And my code: val df: Dataset[Row] = … val cols: List[Column] = df.columns.map(col).toList df.agg(countDistinct(cols.head, cols.tail: _*))
So, in the example above, if I count the distinct “rows” I obtain 7 as result as expected (since the “d" column changes for every row). However, with more columns (16) in EXACTLY the same situation (one incremental column and 15 columns filled with nulls) the result is 0. I don’t understand why I am experiencing this problem. Any solution? Thanks, --- Daniele --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org