Note:
In the code (org.apache.spark.sql.parquet.DefaultSource) I've found this:

val relation = if (doInsertion) {
      // This is a hack. We always set
nullable/containsNull/valueContainsNull to true
      // for the schema of a parquet data.
      val df =
        sqlContext.createDataFrame(
          data.queryExecution.toRdd,
          data.schema.asNullable)
      val createdRelation =
        createRelation(sqlContext, parameters,
df.schema).asInstanceOf[ParquetRelation2]
      createdRelation.insert(df, overwrite = mode == SaveMode.Overwrite)
      createdRelation
    }

The culprit is "data.schema.asNullable". What's the real reason for this?
Why not simply use the existing schema nullable flags?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-saveAsParquetFile-does-not-preserve-AVRO-schema-tp24444p24454.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to