Re: Spark SQL: Preserving Dataframe Schema

Michael Armbrust Tue, 20 Oct 2015 12:39:14 -0700

>
> First, this is not documented in the official document. Maybe we should do
> it? http://spark.apache.org/docs/latest/sql-programming-guide.html
>


Pull requests welcome.


> Second, nullability is a significant concept in the database people. It is
> part of schema. Extra codes are needed for evaluating if a value is null
> for all the nullable data types. Thus, it might cause a problem if you need
> to use Spark to transfer the data between parquet and RDBMS. My suggestion
> is to introduce another external parameter?
>

Sure, but a traditional RDBMS has the opportunity to do validation before
loading data in.  Thats not really an option when you are reading random
files from S3.  This is why Hive and many other systems in this space treat
all columns as nullable.

What would the semantics of this proposed external parameter be?

Re: Spark SQL: Preserving Dataframe Schema

Reply via email to