Cheolsoo Park created SPARK-8908: ------------------------------------ Summary: Calling distinct() with parentheses throws error in Scala DataFrame Key: SPARK-8908 URL: https://issues.apache.org/jira/browse/SPARK-8908 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0, 1.5.0 Reporter: Cheolsoo Park Priority: Minor
To reproduce, please call {{distinct()}} on DataFrame in spark-shell. For eg, {code} scala> sqlContext.table("my_table").distinct() <console>:19: error: not enough arguments for method apply: (colName: String)org.apache.spark.sql.Column in class DataFrame. Unspecified value parameter colName. {code} This is confusing because {{distinct}} in DataFrame is an alias of {{dropDuplicates}}, and both {{dropDuplicates}} and {{dropDuplicates()}} work. Here is the summary- ||Scala code||Works|| |DF.distinct|Y| |DF.distinct()|N| |DF.dropDuplicates|Y| |DF.dropDuplicates()|Y| Looking at the definition of {{distinct}}, it's missing {{()}}- {code} override def distinct: DataFrame = dropDuplicates() {code} As a result, what seems happening is as follows- {code} distinct() => dropDuplicates()() => DataFrame() // because dropDuplicates() returns DF => DataFrame.apply() // fails because apply() takes a column parameter {code} I can verify that adding {{()}} to the definition makes both {{distinct}} and {{distinct()}} work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org