[ https://issues.apache.org/jira/browse/SPARK-17335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Herman van Hovell resolved SPARK-17335. --------------------------------------- Resolution: Fixed Assignee: Herman van Hovell Fix Version/s: 2.1.0 2.0.1 > Creating Hive table from Spark data > ----------------------------------- > > Key: SPARK-17335 > URL: https://issues.apache.org/jira/browse/SPARK-17335 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Reporter: Michal Kielbowicz > Assignee: Herman van Hovell > Fix For: 2.0.1, 2.1.0 > > > Recently my team started using Spark for analysis of huge JSON objects. Spark > itself handles it well. The problem starts when we try to create a Hive table > from it using steps from this part of doc: > http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables > After running command `spark.sql("CREATE TABLE x AS (SELECT * FROM y)") we > get following exception (sorry for obfuscating, confidential data): > {code} > Exception in thread "main" org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.IllegalArgumentException: Error: : expected at the position 993 of > 'string:struct<a:boolean,b:array<string>,c:boolean,d:struct<e:boolean,f:boolean,[...(few > others)],z:boolean,... 4 more fields>,[...(rest of valid struct string)]>' > but ' ' is found.; > {code} > It turned out that the exception was raised because of `... 4 more fields` > part as it is not a valid representation of data structure. > An easy workaround is to set `spark.debug.maxToStringFields` to some large > value. Nevertheless it shouldn't be required and the stringifying process > should use methods targeted at giving valid data structure for Hive. > In my opinion the root problem is here: > https://github.com/apache/spark/blob/9d7a47406ed538f0005cdc7a62bc6e6f20634815/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala#L318 > when calling `simpleString` method instead of `catalogString`. Nevertheless > this class is used at many places and I don't feel that experienced with > Spark to automatically submit PR. > We believe this issue is indirectly caused by this PR: > https://github.com/apache/spark/pull/13537 > There has been almost the same issue in the past. You can find it here: > https://issues.apache.org/jira/browse/SPARK-16415 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org