Re: saveAsTable broken in v1.3 DataFrames?

2015-03-21 Thread Michael Armbrust
I believe that you can get what you want by using HiveQL instead of the pure programatic API. This is a little verbose so perhaps a specialized function would also be useful here. I'm not sure I would call it saveAsExternalTable as there are also external spark sql data source tables that have

Re: saveAsTable broken in v1.3 DataFrames?

2015-03-20 Thread Christian Perez
Any other users interested in a feature DataFrame.saveAsExternalTable() for making _useful_ external tables in Hive, or am I the only one? Bueller? If I start a PR for this, will it be taken seriously? On Thu, Mar 19, 2015 at 9:34 AM, Christian Perez christ...@svds.com wrote: Hi Yin, Thanks

Re: saveAsTable broken in v1.3 DataFrames?

2015-03-19 Thread Yin Huai
I meant table properties and serde properties are used to store metadata of a Spark SQL data source table. We do not set other fields like SerDe lib. For a user, the output of DESCRIBE EXTENDED/FORMATTED on a data source table should not show unrelated stuff like Serde lib and InputFormat. I have

saveAsTable broken in v1.3 DataFrames?

2015-03-19 Thread Christian Perez
Hi all, DataFrame.saveAsTable creates a managed table in Hive (v0.13 on CDH5.3.2) in both spark-shell and pyspark, but creates the *wrong* schema _and_ storage format in the Hive metastore, so that the table cannot be read from inside Hive. Spark itself can read the table, but Hive throws a

Re: saveAsTable broken in v1.3 DataFrames?

2015-03-19 Thread Yin Huai
Hi Christian, Your table is stored correctly in Parquet format. For saveAsTable, the table created is *not* a Hive table, but a Spark SQL data source table ( http://spark.apache.org/docs/1.3.0/sql-programming-guide.html#data-sources). We are only using Hive's metastore to store the metadata (to

Re: saveAsTable broken in v1.3 DataFrames?

2015-03-19 Thread Christian Perez
Hi Yin, Thanks for the clarification. My first reaction is that if this is the intended behavior, it is a wasted opportunity. Why create a managed table in Hive that cannot be read from inside Hive? I think I understand now that you are essentially piggybacking on Hive's metastore to persist