[ https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084306#comment-16084306 ]
Kazuaki Ishizaki edited comment on SPARK-21390 at 7/12/17 5:09 PM: ------------------------------------------------------------------- Another interesting results with Spark-2.2. Is this only for CaseClass on REPL? On IDE {code:java} { ... filterMe1.filter(x=> filterCondition.contains(x)).show filterMe1.filter(x=> filterCondition.contains(SomeClass(x.field1, x.field2))).show } +------+------+ |field1|field2| +------+------+ | 00| 01| +------+------+ +------+------+ |field1|field2| +------+------+ | 00| 01| +------+------+ {code} On REPL {code:java} ... scala> filterMe1.filter(x => filterCondition.contains(x)).show +------+------+ |field1|field2| +------+------+ | 00| 01| +------+------+ scala> filterMe1.filter(x => filterCondition.contains(SomeClass(x.field1, x.field2))).show +------+------+ |field1|field2| +------+------+ +------+------+ scala> print(filterCondition.contains(SomeClass("00", "01"))) true scala> filterMe1.filter(x => { val c = filterCondition.contains(SomeClass(x.field1, x.field2)); print(s"$c\n"); c} ).show false +------+------+ |field1|field2| +------+------+ +------+------+ scala> Seq((0, 0), (1, 1), (2, 2)).toDS.filter(x => { val c = Seq((1, 1)).contains((x._1, x._2)); print(s"$c\n"); c} ).show false true false +---+---+ | _1| _2| +---+---+ | 1| 1| +---+---+ {code} was (Author: kiszk): Another interesting results with Spark-2.2: On IDE {code:java} { ... filterMe1.filter(x=> filterCondition.contains(x)).show filterMe1.filter(x=> filterCondition.contains(SomeClass(x.field1, x.field2))).show } +------+------+ |field1|field2| +------+------+ | 00| 01| +------+------+ +------+------+ |field1|field2| +------+------+ | 00| 01| +------+------+ {code} On REPL {code:java} ... scala> filterMe1.filter(x => filterCondition.contains(x)).show +------+------+ |field1|field2| +------+------+ | 00| 01| +------+------+ scala> filterMe1.filter(x => filterCondition.contains(SomeClass(x.field1, x.field2))).show +------+------+ |field1|field2| +------+------+ +------+------+ scala> print(filterCondition.contains(SomeClass("00", "01"))) true scala> filterMe1.filter(x => { val c = filterCondition.contains(SomeClass(x.field1, x.field2)); print(s"$c\n"); c} ).show false +------+------+ |field1|field2| +------+------+ +------+------+ {code} > Dataset filter api inconsistency > -------------------------------- > > Key: SPARK-21390 > URL: https://issues.apache.org/jira/browse/SPARK-21390 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.1 > Reporter: Gheorghe Gheorghe > Priority: Minor > > Hello everybody, > I've encountered a strange situation with the spark-shell. > When I run the code below in my IDE the second test case prints as expected > count "1". However, when I run the same code using the spark-shell in the > second test case I get 0 back as a count. > I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE > and spark-shell. > {code:java} > import org.apache.spark.sql.Dataset > case class SomeClass(field1:String, field2:String) > val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") ) > // Test 1 > val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS > > println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count) > > // Test 2 > case class OtherClass(field1:String, field2:String) > > val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS > println("Fail, count should return 1: " + filterMe2.filter(x=> > filterCondition.contains(SomeClass(x.field1, x.field2))).count) > {code} > Note if I transform the dataset first I get 1 back as expected. > {code:java} > println(filterMe2.map(x=> SomeClass(x.field1, > x.field2)).filter(filterCondition.contains(_)).count) > {code} > Is this a bug? I can see that this filter function has been marked as > experimental > https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1) -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org