[ https://issues.apache.org/jira/browse/SPARK-26149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuming Wang resolved SPARK-26149. --------------------------------- Resolution: Not A Problem > Read UTF8String from Parquet/ORC may be incorrect > ------------------------------------------------- > > Key: SPARK-26149 > URL: https://issues.apache.org/jira/browse/SPARK-26149 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0 > Reporter: Yuming Wang > Priority: Major > Attachments: SPARK-26149.snappy.parquet, > image-2018-12-04-10-55-49-369.png > > > How to reproduce: > {code:bash} > scala> > spark.read.parquet("/Users/yumwang/SPARK-26149/SPARK-26149.snappy.parquet").selectExpr("s1 > = s2").show > +---------+ > |(s1 = s2)| > +---------+ > | false| > +---------+ > scala> val first = > spark.read.parquet("/Users/yumwang/SPARK-26149/SPARK-26149.snappy.parquet").collect().head > first: org.apache.spark.sql.Row = > [a0750c1f13f0k5��F8j���b�Ro'4da96,a0750c1f13f0k5��F8j���b�Ro'4da96] > scala> println(first.getString(0).equals(first.getString(1))) > true > {code} > {code:sql} > hive> CREATE TABLE `tb1` (`s1` STRING, `s2` STRING) > > stored as parquet > > location "/Users/yumwang/SPARK-26149"; > OK > Time taken: 0.224 seconds > hive> select s1 = s2 from tb1; > OK > true > Time taken: 0.167 seconds, Fetched: 1 row(s) > {code} > As you can see, only UTF8String returns {{false}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org