I copied Hadoop19Shims' implementation of getCombineFileInputFormat (HIVE-1121) into Hadoop18Shims and it worked, if anyone is interested.
And hopefully we can upgrade our Hadoop version soon :) On Fri, Nov 12, 2010 at 12:44 PM, Dave Brondsema <dbronds...@geek.net>wrote: > It seems that I can't use this with Hadoop 0.18 since the > Hadoop18Shims.getCombineFileInputFormat returns null, and > SemanticAnalyzer.java sets HIVEMERGEMAPREDFILES to false if > CombineFileInputFormat is not supported. Is that right? Maybe I can copy > the Hadoop19Shims implementation of getCombineFileInputFormat into > Hadoop18Shims? > > > On Wed, Nov 10, 2010 at 4:31 PM, yongqiang he <heyongqiang...@gmail.com>wrote: > >> I think the problem was solved in hive trunk. You can just try hive trunk. >> >> On Wed, Nov 10, 2010 at 10:05 AM, Dave Brondsema <dbronds...@geek.net> >> wrote: >> > Hi, has there been any resolution to this? I'm having the same trouble. >> > With Hive 0.6 and Hadoop 0.18 and a dynamic partition >> > insert, hive.merge.mapredfiles doesn't work. It works fine for a static >> > partition insert. What I'm seeing is that even when I >> > set hive.merge.mapredfiles=true, the jobconf has it as false for the >> dynamic >> > partition insert. >> > I was reading https://issues.apache.org/jira/browse/HIVE-1307 and it >> looks >> > like maybe Hadoop 0.20 is required for this? >> > Thanks, >> > >> > On Sat, Oct 16, 2010 at 1:50 AM, Sammy Yu <s...@brightedge.com> wrote: >> >> >> >> Hi guys, >> >> Thanks for the response. I tried running without >> >> hive.mergejob.maponly with the same result. I've attached the explain >> >> extended output. I am running this query on EC2 boxes, however it's >> >> not running on EMR. Hive is running on top of a hadoop 0.20.2 setup.. >> >> >> >> Thanks, >> >> Sammy >> >> >> >> On Fri, Oct 15, 2010 at 5:58 PM, Ning Zhang <nzh...@facebook.com> >> wrote: >> >> > The output file shows it only have 2 jobs (the mapreduce job and the >> >> > move task). This indicates that the plan does not have merge enabled. >> Merge >> >> > should consists of a ConditionalTask and 2 sub tasks (a MR task and a >> move >> >> > task). Can you send the plan of the query? >> >> > >> >> > One thing I noticed is that your are using Amazon EMR. I'm not sure >> if >> >> > this is enabled since SET hive.mergejob.maponly=true requires >> >> > CombineHiveInputFormat (only available in Hadoop 0.20 and someone >> reported >> >> > some distribution of Hadoop doesn't support that). So additional >> thing you >> >> > can try is to remove this setting. >> >> > >> >> > On Oct 15, 2010, at 1:43 PM, Sammy Yu wrote: >> >> > >> >> >> Hi, >> >> >> I have a dynamic partition query which generates quite a few small >> >> >> files which I would like to merge: >> >> >> >> >> >> SET hive.exec.dynamic.partition.mode=nonstrict; >> >> >> SET hive.exec.dynamic.partition=true; >> >> >> SET hive.exec.compress.output=true; >> >> >> SET io.seqfile.compression.type=BLOCK; >> >> >> SET hive.merge.size.per.task=256000000; >> >> >> SET hive.merge.smallfiles.avgsize=16000000000; >> >> >> SET hive.merge.mapfiles=true; >> >> >> SET hive.merge.mapredfiles=true; >> >> >> SET hive.mergejob.maponly=true; >> >> >> INSERT OVERWRITE TABLE daily_conversions_without_rank_all_table >> >> >> PARTITION(org_id, day) >> >> >> SELECT session_id, permanent_id, first_date, last_date, week, month, >> >> >> quarter, >> >> >> referral_type, search_engine, us_search_engine, >> >> >> keyword, unnormalized_keyword, branded, conversion_meet, goals_meet, >> >> >> pages_viewed, >> >> >> entry_page, page_types, >> >> >> org_id, day >> >> >> FROM daily_conversions_without_rank_table; >> >> >> >> >> >> I am running the latest version from trunk with HIVE-1622, but it >> >> >> seems like I just can't get the post merge process to happen. I have >> >> >> raised hive.merge.smallfiles.avgsize. I'm wondering if the >> filtering >> >> >> at runtime is causing the merge process to be skipped. Attached are >> >> >> the hive output and log files. >> >> >> >> >> >> >> >> >> Thanks, >> >> >> Sammy >> >> >> <hive_output.txt><hive_job_log_root_201010151114_2037492391.txt> >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> Chief Architect, BrightEdge >> >> email: s...@brightedge.com | mobile: 650.539.4867 | fax: >> >> 650.521.9678 | address: 1850 Gateway Dr Suite 400, San Mateo, CA >> >> 94404 >> > >> > >> > >> > -- >> > Dave Brondsema >> > Software Engineer >> > Geeknet >> > >> > www.geek.net >> > >> > > > > -- > Dave Brondsema > Software Engineer > Geeknet > > www.geek.net > -- Dave Brondsema Software Engineer Geeknet www.geek.net