Re: Spark 1.3.1 + Hive: write output to CSV with header on S3

Michael Armbrust Fri, 17 Jul 2015 09:16:01 -0700

Using a hive-site.xml file on the classpath.

On Fri, Jul 17, 2015 at 8:37 AM, spark user <spark_u...@yahoo.com.invalid>
wrote:


> Hi Roberto
>
> I have question regarding HiveContext .
>
> when you create HiveContext where you define Hive connection properties ?
> Suppose Hive is not in local machine i need to connect , how HiveConext
> will know the data base info like url ,username and password ?
>
> String  username = "";
> String  password = "";
>
> String url = "jdbc:hive2://quickstart.cloudera:10000/default";
>
>
>
>   On Friday, July 17, 2015 2:29 AM, Roberto Coluccio <
> roberto.coluc...@gmail.com> wrote:
>
>
> Hello community,
>
> I'm currently using Spark 1.3.1 with Hive support for outputting processed
> data on an external Hive table backed on S3. I'm using a manual
> specification of the delimiter, but I'd want to know if is there any
> "clean" way to write in CSV format:
>
> *val* sparkConf = *new* SparkConf()
> *val* sc = *new* SparkContext(sparkConf)
> *val* hiveContext = *new* org.apache.spark.sql.hive.HiveContext(sc)
> *import* hiveContext.implicits._
> hiveContext.sql( "CREATE EXTERNAL TABLE IF NOT EXISTS table_name(field1
> STRING, field2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> LOCATION '" + path_on_s3 + "'")
> hiveContext.sql(<an INSERT OVERWRITE query to write into the above table>)
>
> I also need the header of the table to be printed on each written file. I
> tried with:
>
> hiveContext.sql("set hive.cli.print.header=true")
>
> But it didn't work.
>
> Any hint?
>
> Thank you.
>
> Best regards,
> Roberto
>
>
>
>

Re: Spark 1.3.1 + Hive: write output to CSV with header on S3

Reply via email to