I think I encounter the same problem, I'm trying to turn on the compression
of Hive.
I have the following lines:
def initHiveContext(sc: SparkContext): HiveContext = {
    val hc: HiveContext = new HiveContext(sc)
    hc.setConf("hive.exec.compress.output", "true")
    hc.setConf("mapreduce.output.fileoutputformat.compress.codec",
"org.apache.hadoop.io.compress.SnappyCodec")
    hc.setConf("mapreduce.output.fileoutputformat.compress.type", "BLOCK")


    logger.info(hc.getConf("hive.exec.compress.output"))
    logger.info
(hc.getConf("mapreduce.output.fileoutputformat.compress.codec"))
    logger.info
(hc.getConf("mapreduce.output.fileoutputformat.compress.type"))

    hc
  }
And the log for calling it twice:
15/04/21 08:37:39 INFO util.SchemaRDDUtils$: false
15/04/21 08:37:39 INFO util.SchemaRDDUtils$:
org.apache.hadoop.io.compress.SnappyCodec
15/04/21 08:37:39 INFO util.SchemaRDDUtils$: BLOCK
15/04/21 08:37:39 INFO util.SchemaRDDUtils$: true
15/04/21 08:37:39 INFO util.SchemaRDDUtils$:
org.apache.hadoop.io.compress.SnappyCodec
15/04/21 08:37:39 INFO util.SchemaRDDUtils$: BLOCK

BTW
It worked on 1.2.1...


On Thu, Apr 2, 2015 at 11:47 AM, Hao Ren <inv...@gmail.com> wrote:

> Hi,
>
> Jira created: https://issues.apache.org/jira/browse/SPARK-6675
>
> Thank you.
>
>
> On Wed, Apr 1, 2015 at 7:50 PM, Michael Armbrust <mich...@databricks.com>
> wrote:
>
>> Can you open a JIRA please?
>>
>> On Wed, Apr 1, 2015 at 9:38 AM, Hao Ren <inv...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I find HiveContext.setConf does not work correctly. Here are some code
>>> snippets showing the problem:
>>>
>>> snippet 1:
>>>
>>> ----------------------------------------------------------------------------------------------------------------
>>> import org.apache.spark.sql.hive.HiveContext
>>> import org.apache.spark.{SparkConf, SparkContext}
>>>
>>> object Main extends App {
>>>
>>>   val conf = new SparkConf()
>>>     .setAppName("context-test")
>>>     .setMaster("local[8]")
>>>   val sc = new SparkContext(conf)
>>>   val hc = new HiveContext(sc)
>>>
>>>   *hc.setConf("spark.sql.shuffle.partitions", "10")*
>>> *  hc.setConf("hive.metastore.warehouse.dir",
>>> "/home/spark/hive/warehouse_test")*
>>>   hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println
>>>   hc.getAllConfs filter(_._1.contains("shuffle.partitions")) foreach
>>> println
>>> }
>>>
>>> ----------------------------------------------------------------------------------------------------------------
>>>
>>> *Results:*
>>> (hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test)
>>> (spark.sql.shuffle.partitions,10)
>>>
>>> snippet 2:
>>>
>>> ----------------------------------------------------------------------------------------------------------------
>>> ...
>>>   *hc.setConf("hive.metastore.warehouse.dir",
>>> "/home/spark/hive/warehouse_test")*
>>> *  hc.setConf("spark.sql.shuffle.partitions", "10")*
>>>   hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println
>>>   hc.getAllConfs filter(_._1.contains("shuffle.partitions")) foreach
>>> println
>>> ...
>>>
>>> ----------------------------------------------------------------------------------------------------------------
>>>
>>> *Results:*
>>> (hive.metastore.warehouse.dir,/user/hive/warehouse)
>>> (spark.sql.shuffle.partitions,10)
>>>
>>> *You can see that I just permuted the two setConf call, then that leads
>>> to two different Hive configuration.*
>>> *It seems that HiveContext can not set a new value on
>>> "hive.metastore.warehouse.dir" key in one or the first "setConf" call.*
>>> *You need another "setConf" call before changing
>>> "hive.metastore.warehouse.dir". For example, set
>>> "hive.metastore.warehouse.dir" twice and the snippet 1*
>>>
>>> snippet 3:
>>>
>>> ----------------------------------------------------------------------------------------------------------------
>>> ...
>>> *  hc.setConf("hive.metastore.warehouse.dir",
>>> "/home/spark/hive/warehouse_test")*
>>> *  hc.setConf("hive.metastore.warehouse.dir",
>>> "/home/spark/hive/warehouse_test")*
>>>   hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println
>>> ...
>>>
>>> ----------------------------------------------------------------------------------------------------------------
>>>
>>> *Results:*
>>> (hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test)
>>>
>>>
>>> *You can reproduce this if you move to the latest branch-1.3
>>> (1.3.1-snapshot, htag = 7d029cb1eb6f1df1bce1a3f5784fb7ce2f981a33)*
>>>
>>> *I have also tested the released 1.3.0 (htag =
>>> 4aaf48d46d13129f0f9bdafd771dd80fe568a7dc). It has the same problem.*
>>>
>>> *Please tell me if I am missing something. Any help is highly
>>> appreciated.*
>>>
>>> Hao
>>>
>>> --
>>> Hao Ren
>>>
>>> {Data, Software} Engineer @ ClaraVista
>>>
>>> Paris, France
>>>
>>
>>
>
>
> --
> Hao Ren
>
> {Data, Software} Engineer @ ClaraVista
>
> Paris, France
>

Reply via email to