Hi,

I am using *where method* of dataframe to filter data. 
I am comparing Integer field with String type data, this comparision results
full table data. 
I have tested same scenario with HIVE and MYSQL but this comparision will
not give any result. 

*Scenario : *

 val sqlDf = df.where("f1 = 'abc'") 
 here f1 : Integer
 
* Input:*
 14
 15
 16
 
* output: *
 14
 15
 16
 
*Logical and Physical Plan : *
 
 == Parsed Logical Plan ==
'Filter ('f1 = abc)
+- Relation[f1#0] csv

== Analyzed Logical Plan ==
f1: int
Filter (cast(f1#0 as double) = cast(abc as double))
+- Relation[f1#0] csv

== Optimized Logical Plan ==
Filter (isnotnull(f1#0) && null)
+- Relation[f1#0] csv

== Physical Plan ==
*Project [f1#0]
+- *Filter isnotnull(f1#0)
   +- *Scan csv [f1#0] Format: CSV, InputPaths:
file:/C:/Users/santlalg/IdeaProjects/SparkTestPoc/Int, PartitionFilters:
[null], PushedFilters: [IsNotNull(f1)], ReadSchema: struct<f1:int>

  
In *Optimized Logical Plan*, why *cast(f1#0 as double) > cast(abc as
double)* from *Analyzed Logical Plan* is replaced with /null/?
   
I am using below version of dependency:
Spark-core : 2.0.2
Spark-sql : 2.0.2

In My scenario this should be false, so that dataframe should not give any
result. 
Can someone help me to achieve this?

Thanks 
Santlal 
   
 
   



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Mismatch-in-data-type-comparision-results-full-data-in-Spark-tp28521.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to