I believe that you can get what you want by using HiveQL instead of the
pure programatic API. This is a little verbose so perhaps a specialized
function would also be useful here. I'm not sure I would call it
saveAsExternalTable as there are also "external" spark sql data source
tables that have
Any other users interested in a feature
DataFrame.saveAsExternalTable() for making _useful_ external tables in
Hive, or am I the only one? Bueller? If I start a PR for this, will it
be taken seriously?
On Thu, Mar 19, 2015 at 9:34 AM, Christian Perez wrote:
> Hi Yin,
>
> Thanks for the clarificat
Hi Yin,
Thanks for the clarification. My first reaction is that if this is the
intended behavior, it is a wasted opportunity. Why create a managed
table in Hive that cannot be read from inside Hive? I think I
understand now that you are essentially piggybacking on Hive's
metastore to persist table
I meant table properties and serde properties are used to store metadata of
a Spark SQL data source table. We do not set other fields like SerDe lib.
For a user, the output of DESCRIBE EXTENDED/FORMATTED on a data source
table should not show unrelated stuff like Serde lib and InputFormat. I
have c
Hi Christian,
Your table is stored correctly in Parquet format.
For saveAsTable, the table created is *not* a Hive table, but a Spark SQL
data source table (
http://spark.apache.org/docs/1.3.0/sql-programming-guide.html#data-sources).
We are only using Hive's metastore to store the metadata (to b
Hi all,
DataFrame.saveAsTable creates a managed table in Hive (v0.13 on
CDH5.3.2) in both spark-shell and pyspark, but creates the *wrong*
schema _and_ storage format in the Hive metastore, so that the table
cannot be read from inside Hive. Spark itself can read the table, but
Hive throws a Serial