HI, I just want to figure out why the two contexts behavior differently even on a simple query. In a netshell, I have a query in which there is a String containing single quote and casting to Array/Map. I have tried all the combination of diff type of sql context and query call api (sql, df.select, df.selectExpr). I can't find one rules all.
Here is the code for reproducing the problem. ----------------------------------------------------------------------------- import org.apache.spark.sql.SQLContext import org.apache.spark.sql.hive.HiveContext import org.apache.spark.{SparkConf, SparkContext} object Test extends App { val sc = new SparkContext("local[2]", "test", new SparkConf) val hiveContext = new HiveContext(sc) val sqlContext = new SQLContext(sc) val context = hiveContext // val context = sqlContext import context.implicits._ val df = Seq((Seq(1, 2), 2)).toDF("a", "b") df.registerTempTable("tbl") df.printSchema() // case 1 context.sql("select cast(a as array<string>) from tbl").show() // HiveContext => org.apache.spark.sql.AnalysisException: cannot recognize input near 'array' '<' 'string' in primitive type specification; line 1 pos 17 // SQLContext => OK // case 2 context.sql("select 'a\\'b'").show() // HiveContext => OK // SQLContext => failure: ``union'' expected but ErrorToken(unclosed string literal) found // case 3 df.selectExpr("cast(a as array<string>)").show() // OK with HiveContext and SQLContext // case 4 df.selectExpr("'a\\'b'").show() // HiveContext, SQLContext => failure: end of input expected } ----------------------------------------------------------------------------- Any clarification / workaround is high appreciated. -- Hao Ren Data Engineer @ leboncoin Paris, France