RE: hive1.2.1 on spark 1.5.2

2016-01-26 Thread Mich Talebzadeh
As far as I have worked this one out Hive 1.2.1 works on Spark 1.3.1 for now. This means Hive will use spark engine. set spark.home=/usr/lib/spark-1.3.1-bin-hadoop2.6; set hive.execution.engine=spark; set spark.master=yarn-client; set hive.optimize.ppd=true; Beeline version 1.2.1 by

Re: Hive Bucketing

2016-01-26 Thread 谭成灶
I use hive 0.13. src_tbl:nonpartitioned table,store as textfile dst_tbl:partitioned table,store as orc here is the code: SET hive.exec.dynamic.partition=true; SET hive.exec.dynamic.partition.mode=nonstrict; set hive.enforce.sorting=true; set hive.exec.reducers.max=80; insert overwrite

Partition performance

2016-01-26 Thread Shubhvardhan Manjayya
Hi see this cloudera blog at: http://blog.cloudera.com/blog/2014/08/improving-query-performance-using-partitioning-in-apache-hive/ That mentions "Do not over-partition the data. With too many small partitions, the task of recursively scanning the directories becomes more expensive than a full

RE: Partition performance

2016-01-26 Thread Mich Talebzadeh
Check the threads in hive user group under “Impact of partitioning on certain queries” HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw Sybase ASE 15 Gold Medal Award 2008 A Winning Strategy: Running the most

Two results are inconsistent when i use Hive on Spark

2016-01-26 Thread Jone Zhang
, hand_suc_inuser, day_hand_suc_incnt, day_hand_suc_inuser from (select * from t_ed_soft_assist_useraction_stat where ds=20160126)t11 full outer join (select * from t_md_soft_lanmu_app_dload_detail where ds=20160126)t12 on t11.qua=t12.qua and t11.app_id=t12.appid and t11

Re: Two results are inconsistent when i use Hive on Spark

2016-01-26 Thread Jone Zhang
er, > evil_dload_cnt, > evil_dload_user, > update_dcnt, > update_duser, > hand_suc_incnt, > hand_suc_inuser, > day_hand_suc_incnt, > day_hand_suc_inuser > from > (select * from t_ed_soft_assist_useraction_stat where ds=20160126)t11 > full

hive1.2.1 on spark 1.5.2

2016-01-26 Thread kevin
hi,all I tried hive on spark with version hive1.2.1 spark1.5.2. I build spark witout -Phive . And I test spark cluster stand alone with spark-submit and it is ok. but when I use hive , on spark web-site I can see the hive on spark application ,finally I got error: 16/01/26 16:23:42 INFO

Re: Hive Bucketing

2016-01-26 Thread Prasanth Jayachandran
Hi Can you try with hive.optimize.sort.dynamic.partition=true? HIVE-6455 for more info. This should also avoid creating too many small files per partition. Thanks Prasanth On Tue, Jan 26, 2016 at 4:26 AM -0800, "谭成灶" > wrote: I use hive 0.13.

RE: Hive Bucketing

2016-01-26 Thread Mich Talebzadeh
Ok so what is the resolution here? My understanding is that bucketing does not improve the performance. Is that correct? Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw Sybase ASE 15 Gold Medal Award 2008 A Winning

Re: [VOTE] Hive 2.0 release plan

2016-01-26 Thread Sergey Shelukhin
Yeah I will send an update. There were a few more blockers and now again one remains. From: Hanish Bansal > Reply-To: "user@hive.apache.org" >

Re: Hive Bucketing

2016-01-26 Thread Gopal Vijayaraghavan
> Ok so what is the resolution here? My understanding is that bucketing >does not improve the performance. Is that correct? There are no right answers here - I spend a lot of time fixing over-zealous optimization attempts

Bucketing in Hive

2016-01-26 Thread Mich Talebzadeh
Hi, There are number of questions brought up about Hive Bucketing. As I see - it is another name for hash partitioning (assuming that Hive partitioning is effectively range partitioning). I borrow these terms (range and hash partitioning) from industry standard as they are commonly used

Re: Bucketing in Hive

2016-01-26 Thread Maciek
These two serve the same purpose and logically are very much alike. The difference is that partitioning may be explicit (partitioning, in pretty much all solid RDMBSs, Hive too) or implicit (hashing/bucketing, just Hive?). In Hive, for some reason, they come with different, mutually exclusive set

RE: Bucketing in Hive

2016-01-26 Thread Mich Talebzadeh
Thanks for the link Maciek. I read and quote: “Logically and functionally bucketing and partitioning are quite similar - both provide mechanism to segregate and separate the table's data based on its content. Thanks to that significant further optimisations like [partition] PRUNING or

Re: hive1.2.1 on spark 1.5.2

2016-01-26 Thread kevin
Thank you Mich Talebzadeh and Sofia Panagiotidi . I changed my spark version to 1.4.1 , and everything is ok. 2016-01-26 16:45 GMT+08:00 kevin : > hi,all >I tried hive on spark with version hive1.2.1 spark1.5.2. I build spark > witout -Phive . And I test spark

Re: hive1.2.1 on spark 1.5.2

2016-01-26 Thread Sofia Panagiotidi
I have also managed to use Hive 1.2.1 with Spark 1.4.1 > On 26 Jan 2016, at 10:20, Mich Talebzadeh wrote: > > As far as I have worked this one out Hive 1.2.1 works on Spark 1.3.1 for now. > This means Hive will use spark engine. > > > set

Re: hive1.2.1 on spark 1.5.2

2016-01-26 Thread Sofia Panagiotidi
Using the prebuilt versions from the apache website for both Hive and Spark > On 26 Jan 2016, at 14:00, Sofia Panagiotidi > wrote: > > I have also managed to use Hive 1.2.1 with Spark 1.4.1 > > >> On 26 Jan 2016, at 10:20, Mich Talebzadeh