So I did another test on this. hive> create table test(foo int,bar string) clustered by(foo) sorted by (foo asc) into 2 buckets; OK Time taken: 0.097 seconds hive> create table test2 (foo int,bar string) clustered by(foo) sorted by (foo asc) into 2 buckets; OK hive> LOAD DATA LOCAL INPATH 'hive/examples/files/kv1.txt' OVERWRITE INTO TABLE test; hive> set hive.enforce.bucketing=true; hive> set hive.enforce.sorting=true; hive> insert into table test2 select * from test; Total MapReduce jobs = 1 Launching Job 1 out of 1 ………………………………………………………… hive> insert into table test2 select * from test2; FAILED: Error in semantic analysis: Bucketized tables do not support INSERT INTO: Table: test2
Seems like the error"FAILED: Error in semantic analysis: Bucketized tables do not support INSERT INTO: Table: vt_new_data error is occurred " is only thrown when insert into a bucketized table from the same table? And when insert into a bucketized table multi-times, it will create a original_file_copy_n under the same bucket. -rw-r--r-- 3 wbian supergroup 2856 2012-05-31 22:03 /user/hive/warehouse/test2/000000_0 -rw-r--r-- 3 wbian supergroup 2856 2012-05-31 22:04 /user/hive/warehouse/test2/000000_0_copy_1 -rw-r--r-- 3 wbian supergroup 2956 2012-05-31 22:03 /user/hive/warehouse/test2/000001_0 -rw-r--r-- 3 wbian supergroup 2956 2012-05-31 22:04 /user/hive/warehouse/test2/000001_0_copy_1 And since what I want to do is SMB Map Join, the following triggered the SMB Map Join successfully set hive.optimize.bucketmapjoin= true; set hive.optimize.bucketmapjoin.sortedmerge = true; set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; select /*+mapjoin(test)*/ * from pokes join test on pokes.foo=test.foo; So what's the reason for throwing that error(i mean why not support insert into a bucketized table from the same table)?And isn't that error message kind of misleading? On Thu, May 31, 2012 at 6:43 PM, Bruce Bian <weidong....@gmail.com> wrote: > I'm using hive 0.9.0 > > On Thursday, May 31, 2012, Bruce Bian wrote: > >> Hi, >> I've got a table vt_new_data which is defined as follows: >> CREATE TABLE VT_NEW_DATA >> ( >> V_ACCOUNT_NUM string >> ,V_ACCOUNT_MODIFIER_NUM string >> ,V_DEPOSIT_TYPE_CD string >> ,V_DEPOSIT_TERM int >> ,V_LEDGER_SUBJECT_ID string >> ,V_ACCOUNTING_ORG_CD string >> ,V_OPEN_DT string >> ,V_CLOSE_DT string >> ,V_CURRENCY_CD string >> ,V_ACCOUNT_BAL float >> ,V_INNER_MONTH_DELAY_ACCUM float >> ) CLUSTERED BY (V_ACCOUNT_NUM,V_ACCOUNT_MODIFIER_NUM) SORTED BY >> (V_ACCOUNT_NUM,V_ACCOUNT_MODIFIER_NUM ASC) INTO 256 BUCKETS STORED AS >> RCFile; >> >> when I execute the following query >> explain insert into table vt_new_data select * from vt_new_data limit 1; >> (this is just a test) >> an FAILED: Error in semantic analysis: Bucketized tables do not support >> INSERT INTO: Table: vt_new_data error is occurred >> >> but when I execute the query: >> explain insert into table vt_new_data >> select /*+ MAPJOIN(T4) */ >> t1.account_num as v_account_num >> ,t1.account_modifier_num as v_account_modifier_num >> ,'3006' as v_deposit_type_cd >> ,0 as v_deposit_term >> ,'23201000' v_ledger_subject_id >> ,coalesce(t2.party_id,'') as v_accounting_org_cd >> ,coalesce(t3.card_begin_dt,'19000101') as v_open_dt >> ,coalesce(t3.card_live_dt,'19000101') as v_close_dt >> ,coalesce(t4.currency_cd,substr(t1.account_modifier_num,3,3)) as >> v_currency_cd >> ,coalesce(t4.agt_amt,0) as v_account_bal >> ,0 as v_inner_month_delay_accum >> from t03_e_cash_bucket t1 >> left outer join t03_agt_amount_h_bucket t4 >> on t1.account_num=t4.account_num >> and t1.account_modifier_num=t4.account_modifier_num >> and t4.agt_amt_type_cd = '001' >> and t4.start_date<='$TXNDATE' >> and t4.end_date>'$TXNDATE' >> left outer join t01_party_card_rela_h_bucket t2 >> on t1.card_no = t2.card_no >> and t2.party_card_rela_type_cd = '01' >> and t2.start_date<='$TXNDATE' >> and t2.end_date>'$TXNDATE' >> left outer join t03_card_bucket t3 >> on t1.card_no = t3.card_no; >> >> the execution plan is generated successfully and triggered an SMB Map >> Join, which is great. >> >> But I don't see the difference here? As both are inserting into a >> bucketized and sorted table? >> >