Hi, calling getConf don't solve the issue. Even many hive specific queries are broken. Seems like no hive configurations are getting passed properly.
Regards, Madhukara Phatak http://datamantra.io/ On Wed, Apr 22, 2015 at 2:19 AM, Michael Armbrust <mich...@databricks.com> wrote: > As a workaround, can you call getConf first before any setConf? > > On Tue, Apr 21, 2015 at 1:58 AM, Ophir Cohen <oph...@gmail.com> wrote: > >> I think I encounter the same problem, I'm trying to turn on the >> compression of Hive. >> I have the following lines: >> def initHiveContext(sc: SparkContext): HiveContext = { >> val hc: HiveContext = new HiveContext(sc) >> hc.setConf("hive.exec.compress.output", "true") >> hc.setConf("mapreduce.output.fileoutputformat.compress.codec", >> "org.apache.hadoop.io.compress.SnappyCodec") >> hc.setConf("mapreduce.output.fileoutputformat.compress.type", "BLOCK") >> >> >> logger.info(hc.getConf("hive.exec.compress.output")) >> logger.info >> (hc.getConf("mapreduce.output.fileoutputformat.compress.codec")) >> logger.info >> (hc.getConf("mapreduce.output.fileoutputformat.compress.type")) >> >> hc >> } >> And the log for calling it twice: >> 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: false >> 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: >> org.apache.hadoop.io.compress.SnappyCodec >> 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: BLOCK >> 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: true >> 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: >> org.apache.hadoop.io.compress.SnappyCodec >> 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: BLOCK >> >> BTW >> It worked on 1.2.1... >> >> >> On Thu, Apr 2, 2015 at 11:47 AM, Hao Ren <inv...@gmail.com> wrote: >> >>> Hi, >>> >>> Jira created: https://issues.apache.org/jira/browse/SPARK-6675 >>> >>> Thank you. >>> >>> >>> On Wed, Apr 1, 2015 at 7:50 PM, Michael Armbrust <mich...@databricks.com >>> > wrote: >>> >>>> Can you open a JIRA please? >>>> >>>> On Wed, Apr 1, 2015 at 9:38 AM, Hao Ren <inv...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I find HiveContext.setConf does not work correctly. Here are some code >>>>> snippets showing the problem: >>>>> >>>>> snippet 1: >>>>> >>>>> ---------------------------------------------------------------------------------------------------------------- >>>>> import org.apache.spark.sql.hive.HiveContext >>>>> import org.apache.spark.{SparkConf, SparkContext} >>>>> >>>>> object Main extends App { >>>>> >>>>> val conf = new SparkConf() >>>>> .setAppName("context-test") >>>>> .setMaster("local[8]") >>>>> val sc = new SparkContext(conf) >>>>> val hc = new HiveContext(sc) >>>>> >>>>> *hc.setConf("spark.sql.shuffle.partitions", "10")* >>>>> * hc.setConf("hive.metastore.warehouse.dir", >>>>> "/home/spark/hive/warehouse_test")* >>>>> hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println >>>>> hc.getAllConfs filter(_._1.contains("shuffle.partitions")) foreach >>>>> println >>>>> } >>>>> >>>>> ---------------------------------------------------------------------------------------------------------------- >>>>> >>>>> *Results:* >>>>> (hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test) >>>>> (spark.sql.shuffle.partitions,10) >>>>> >>>>> snippet 2: >>>>> >>>>> ---------------------------------------------------------------------------------------------------------------- >>>>> ... >>>>> *hc.setConf("hive.metastore.warehouse.dir", >>>>> "/home/spark/hive/warehouse_test")* >>>>> * hc.setConf("spark.sql.shuffle.partitions", "10")* >>>>> hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println >>>>> hc.getAllConfs filter(_._1.contains("shuffle.partitions")) foreach >>>>> println >>>>> ... >>>>> >>>>> ---------------------------------------------------------------------------------------------------------------- >>>>> >>>>> *Results:* >>>>> (hive.metastore.warehouse.dir,/user/hive/warehouse) >>>>> (spark.sql.shuffle.partitions,10) >>>>> >>>>> *You can see that I just permuted the two setConf call, then that >>>>> leads to two different Hive configuration.* >>>>> *It seems that HiveContext can not set a new value on >>>>> "hive.metastore.warehouse.dir" key in one or the first "setConf" call.* >>>>> *You need another "setConf" call before changing >>>>> "hive.metastore.warehouse.dir". For example, set >>>>> "hive.metastore.warehouse.dir" twice and the snippet 1* >>>>> >>>>> snippet 3: >>>>> >>>>> ---------------------------------------------------------------------------------------------------------------- >>>>> ... >>>>> * hc.setConf("hive.metastore.warehouse.dir", >>>>> "/home/spark/hive/warehouse_test")* >>>>> * hc.setConf("hive.metastore.warehouse.dir", >>>>> "/home/spark/hive/warehouse_test")* >>>>> hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println >>>>> ... >>>>> >>>>> ---------------------------------------------------------------------------------------------------------------- >>>>> >>>>> *Results:* >>>>> (hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test) >>>>> >>>>> >>>>> *You can reproduce this if you move to the latest branch-1.3 >>>>> (1.3.1-snapshot, htag = 7d029cb1eb6f1df1bce1a3f5784fb7ce2f981a33)* >>>>> >>>>> *I have also tested the released 1.3.0 (htag = >>>>> 4aaf48d46d13129f0f9bdafd771dd80fe568a7dc). It has the same problem.* >>>>> >>>>> *Please tell me if I am missing something. Any help is highly >>>>> appreciated.* >>>>> >>>>> Hao >>>>> >>>>> -- >>>>> Hao Ren >>>>> >>>>> {Data, Software} Engineer @ ClaraVista >>>>> >>>>> Paris, France >>>>> >>>> >>>> >>> >>> >>> -- >>> Hao Ren >>> >>> {Data, Software} Engineer @ ClaraVista >>> >>> Paris, France >>> >> >> >