Re: How to apply schema to queried data from Hive before saving it as parquet file?

Michael Armbrust Wed, 19 Nov 2014 10:56:24 -0800

I am not very familiar with the JSONSerDe for Hive, but in general you
should not need to manually create a schema for data that is loaded from
hive.  You should just be able to call saveAsParquetFile on any SchemaRDD
that is returned from hctx.sql(...).


I'd also suggest you check out the jsonFile/jsonRDD methods that are
available on HiveContext.

On Wed, Nov 19, 2014 at 1:34 AM, akshayhazari <akshayhaz...@gmail.com>
wrote:

> The below part of code contains a part which creates a table in hive from
> data and and another part below creates a Schema.
> *Now if I try to save the quried data as a parquet file where
> hctx.sql("Select * from sparkHive1") returns me a SchemaRDD
> which contains records from table .*
>        hctx.sql("Select * from
>
> sparkHive1").saveAsParquetFile("/home/hduser/Documents/Credentials/Newest_Credentials_AX/Songs/spark-1.1.0/HiveOP");
>
> *As per the code in the following link  before saving the file as a Parquet
> File the sqlContext is applied with a schema. How can I do that(save as
> parquet file) when I am using Hive Context to fetch data.*
>
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
>
> Any Help Please.
>
> --------------------------------------------------------------------------------------
>
>        HiveContext hctx= new HiveContext(sctx); //sctx SparkContext
>         hctx.sql("Select * from sparkHive1")
>         hctx.sql("ADD JAR
> /home/hduser/BIGDATA_STUFF/Java_Hive2/hive-json-serde-0.2.jar");
>         hctx.sql("Create table if not exists sparkHive1(id INT,name
> STRING,score INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.\
> JsonSerde'");
>         hctx.sql("Load data local inpath
>
> '/home/hduser/Documents/Credentials/Newest_Credentials_AX/Songs/spark-1.1.0/ip3.json'
> into table sparkHive1");
>
>          String schemaString = "id name score";
>
>         List<StructField> fields = new ArrayList<StructField>();
>         for (String fieldName: schemaString.split(" ")) {
>             if(fieldName.contains("name"))
>                 fields.add(DataType.createStructField(fieldName,
> DataType.StringType, true));
>             else
>                 fields.add(DataType.createStructField(fieldName,
> DataType.IntegerType, true));
>         }
>         StructType schema = DataType.createStructType(fields);
>          *//How can I apply the schema before saving as parquet file.*
>          hctx.sql("Select * from
>
> sparkHive1").saveAsParquetFile("/home/hduser/Documents/Credentials/Newest_Credentials_AX/Songs/spark-1.1.0/HiveOP");
>
> ------------------------------------------------------------------------------------------------
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-apply-schema-to-queried-data-from-Hive-before-saving-it-as-parquet-file-tp19259.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: How to apply schema to queried data from Hive before saving it as parquet file?

Reply via email to