Hey Todd,

In migrating to 1.3.x I see that the spark.sql.hive.convertMetastoreParquet
> is no longer public, so the above no longer works.


This was probably just a typo, but to be clear,
 spark.sql.hive.convertMetastoreParquet is still a supported option and
should work.  You are correct that the HiveMetastoreTypes class is now
private (we made a lot of stuff private starting with 1.3 (and the removal
of "alpha") since we are now promising binary compatibility for public
APIs. Your hack seems reasonable, but I'll caution this is not a stable
public API so could break with future upgrades.

While this will work, is there a better way to achieve this under 1.3.x?


If you are only looking for the ability to read this data with Spark SQL
(and not Hive) I suggest you look at the Data Sources API syntax for
creating tables.  You don't need to specify the schema at all for self
describing formats like parquet.

CREATE TABLE tableName
USING parquet
OPTIONS (
  path '/path/to/file'
)

Michael


On Mon, Apr 6, 2015 at 11:37 AM, Todd Nist <tsind...@gmail.com> wrote:

> In 1.2.1 of I was persisting a set of parquet files as a table for use by
> spark-sql cli later on. There was a post here
> <http://apache-spark-user-list.1001560.n3.nabble.com/persist-table-schema-in-spark-sql-tt16297.html#a16311>
>  by
> Mchael Armbrust that provide a nice little helper method for dealing with
> this:
>
> /**
>  * Sugar for creating a Hive external table from a parquet path.
>  */def createParquetTable(name: String, file: String): Unit = {
>   import org.apache.spark.sql.hive.HiveMetastoreTypes
>
>   val rdd = parquetFile(file)
>   val schema = rdd.schema.fields.map(f => s"${f.name} 
> ${HiveMetastoreTypes.toMetastoreType(f.dataType)}").mkString(",\n")
>   val ddl = s"""
>     |CREATE EXTERNAL TABLE $name (
>     |  $schema
>     |)
>     |ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
>     |STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
>     |OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'
>     |LOCATION '$file'""".stripMargin
>   sql(ddl)
>   setConf("spark.sql.hive.convertMetastoreParquet", "true")
> }
>
> In migrating to 1.3.x I see that the
> spark.sql.hive.convertMetastoreParquet is no longer public, so the above no
> longer works.
>
> I can define a helper method that wraps the HiveMetastoreTypes something
> like:
>
> package org.apache.spark.sql.hive
> import org.apache.spark.sql.types.DataType
>
> /**
>  * Helper to expose HiveMetastoreTypes hidden by Spark.  It is created in 
> this name space to make it accessible.
>  */
> object HiveTypeHelper {
>   def toDataType(metastoreType: String): DataType = 
> HiveMetastoreTypes.toDataType(metastoreType)
>   def toMetastoreType(dataType: DataType): String = 
> HiveMetastoreTypes.toMetastoreType(dataType)
> }
>
> While this will work, is there a better way to achieve this under 1.3.x?
>
> TIA for the assistance.
>
> -Todd
>

Reply via email to