I think I encounter the same problem, I'm trying to turn on the compression of Hive. I have the following lines: def initHiveContext(sc: SparkContext): HiveContext = { val hc: HiveContext = new HiveContext(sc) hc.setConf("hive.exec.compress.output", "true") hc.setConf("mapreduce.output.fileoutputformat.compress.codec", "org.apache.hadoop.io.compress.SnappyCodec") hc.setConf("mapreduce.output.fileoutputformat.compress.type", "BLOCK")
logger.info(hc.getConf("hive.exec.compress.output")) logger.info (hc.getConf("mapreduce.output.fileoutputformat.compress.codec")) logger.info (hc.getConf("mapreduce.output.fileoutputformat.compress.type")) hc } And the log for calling it twice: 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: false 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: org.apache.hadoop.io.compress.SnappyCodec 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: BLOCK 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: true 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: org.apache.hadoop.io.compress.SnappyCodec 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: BLOCK BTW It worked on 1.2.1... On Thu, Apr 2, 2015 at 11:47 AM, Hao Ren <inv...@gmail.com> wrote: > Hi, > > Jira created: https://issues.apache.org/jira/browse/SPARK-6675 > > Thank you. > > > On Wed, Apr 1, 2015 at 7:50 PM, Michael Armbrust <mich...@databricks.com> > wrote: > >> Can you open a JIRA please? >> >> On Wed, Apr 1, 2015 at 9:38 AM, Hao Ren <inv...@gmail.com> wrote: >> >>> Hi, >>> >>> I find HiveContext.setConf does not work correctly. Here are some code >>> snippets showing the problem: >>> >>> snippet 1: >>> >>> ---------------------------------------------------------------------------------------------------------------- >>> import org.apache.spark.sql.hive.HiveContext >>> import org.apache.spark.{SparkConf, SparkContext} >>> >>> object Main extends App { >>> >>> val conf = new SparkConf() >>> .setAppName("context-test") >>> .setMaster("local[8]") >>> val sc = new SparkContext(conf) >>> val hc = new HiveContext(sc) >>> >>> *hc.setConf("spark.sql.shuffle.partitions", "10")* >>> * hc.setConf("hive.metastore.warehouse.dir", >>> "/home/spark/hive/warehouse_test")* >>> hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println >>> hc.getAllConfs filter(_._1.contains("shuffle.partitions")) foreach >>> println >>> } >>> >>> ---------------------------------------------------------------------------------------------------------------- >>> >>> *Results:* >>> (hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test) >>> (spark.sql.shuffle.partitions,10) >>> >>> snippet 2: >>> >>> ---------------------------------------------------------------------------------------------------------------- >>> ... >>> *hc.setConf("hive.metastore.warehouse.dir", >>> "/home/spark/hive/warehouse_test")* >>> * hc.setConf("spark.sql.shuffle.partitions", "10")* >>> hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println >>> hc.getAllConfs filter(_._1.contains("shuffle.partitions")) foreach >>> println >>> ... >>> >>> ---------------------------------------------------------------------------------------------------------------- >>> >>> *Results:* >>> (hive.metastore.warehouse.dir,/user/hive/warehouse) >>> (spark.sql.shuffle.partitions,10) >>> >>> *You can see that I just permuted the two setConf call, then that leads >>> to two different Hive configuration.* >>> *It seems that HiveContext can not set a new value on >>> "hive.metastore.warehouse.dir" key in one or the first "setConf" call.* >>> *You need another "setConf" call before changing >>> "hive.metastore.warehouse.dir". For example, set >>> "hive.metastore.warehouse.dir" twice and the snippet 1* >>> >>> snippet 3: >>> >>> ---------------------------------------------------------------------------------------------------------------- >>> ... >>> * hc.setConf("hive.metastore.warehouse.dir", >>> "/home/spark/hive/warehouse_test")* >>> * hc.setConf("hive.metastore.warehouse.dir", >>> "/home/spark/hive/warehouse_test")* >>> hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println >>> ... >>> >>> ---------------------------------------------------------------------------------------------------------------- >>> >>> *Results:* >>> (hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test) >>> >>> >>> *You can reproduce this if you move to the latest branch-1.3 >>> (1.3.1-snapshot, htag = 7d029cb1eb6f1df1bce1a3f5784fb7ce2f981a33)* >>> >>> *I have also tested the released 1.3.0 (htag = >>> 4aaf48d46d13129f0f9bdafd771dd80fe568a7dc). It has the same problem.* >>> >>> *Please tell me if I am missing something. Any help is highly >>> appreciated.* >>> >>> Hao >>> >>> -- >>> Hao Ren >>> >>> {Data, Software} Engineer @ ClaraVista >>> >>> Paris, France >>> >> >> > > > -- > Hao Ren > > {Data, Software} Engineer @ ClaraVista > > Paris, France >