I'm running with CDH 5.3.3 (Hadoop 2.5.0 + cdh patches)... so those two issues are hopefully not an issue. I'll try the two configs suggested and report back.
Thanks! On Wed, Sep 30, 2015 at 3:14 PM, Ryan Harris <ryan.har...@zionsbancorp.com> wrote: > I would suggest trying: > > set hive.hadoop.supports.splittable.combineinputformat = true; > > > > you might also need to increase mapreduce.input.fileinputformat.split.minsize > to something larger, like 32MB > > set mapreduce.input.fileinputformat.split.minsize = 33554432; > > > > Depending on your hadoop distro and version, be potentially aware of > > https://issues.apache.org/jira/browse/MAPREDUCE-1597 > > and > > https://issues.apache.org/jira/browse/MAPREDUCE-5537 > > > > test it and see... > > > > *From:* Pradeep Gollakota [mailto:pradeep...@gmail.com] > *Sent:* Wednesday, September 30, 2015 3:33 PM > *To:* user@hive.apache.org > *Subject:* Re: CombineHiveInputFormat not working > > > > mapred.min.split.size = mapreduce.input.fileinputformat.split.maxsize = 1 > mapred.max.split.size = mapreduce.input.fileinputformat.split.maxsize = > 134217728 > hive.hadoop.supports.splittable.combineinputformat = false > > > > My average file size is pretty small... it's usually between 500K and 20MB. > > > > So it looks like the splittable support is turned off? I've been seeing > some posts on the mailing list saying there's correctness problems when > using this and LZO. > > > > Is this still the case? Can I turn this on with LZ4? > > > > Thanks! > > > > On Wed, Sep 30, 2015 at 1:38 PM, Ryan Harris <ryan.har...@zionsbancorp.com> > wrote: > > Also... > > mapreduce.input.fileinputformat.split.maxsize > > > > and, what is the size of your input files? > > > > *From:* Ryan Harris > *Sent:* Wednesday, September 30, 2015 2:37 PM > *To:* 'user@hive.apache.org' > *Subject:* RE: CombineHiveInputFormat not working > > > > what are your values for: > > mapred.min.split.size > > mapred.max.split.size > > hive.hadoop.supports.splittable.combineinputformat > > > > > > *From:* Pradeep Gollakota [mailto:pradeep...@gmail.com] > *Sent:* Wednesday, September 30, 2015 2:20 PM > *To:* user@hive.apache.org > *Subject:* CombineHiveInputFormat not working > > > > Hi all, > > > > I have an external table of with the following DDL. > > > > ``` > > DROP TABLE IF EXISTS raw_events; > > CREATE EXTERNAL TABLE IF NOT EXISTS raw_events ( > > raw_event_string string) > > PARTITIONED BY (dc string, community string, dt string) > > STORED AS TEXTFILE > > LOCATION '/lithium/events/{dc}/{community}/events/{year}/{month}/{day}' > > ``` > > > > The files are loaded externally and are LZ4 compressed. When I run a query > on this table for a single day, I'm getting 1 mapper per file even though > the input format is set to CombineHiveInputFormat. > > > > Does anyone know if CombineHiveInputFormat does not work with LZ4 > compressed files or have any idea why split combination is not working? > > > > Thanks! > > Pradeep > ------------------------------ > > THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS > CONFIDENTIAL and may contain information that is privileged and exempt from > disclosure under applicable law. If you are neither the intended recipient > nor responsible for delivering the message to the intended recipient, > please note that any dissemination, distribution, copying or the taking of > any action in reliance upon the message is strictly prohibited. If you have > received this communication in error, please notify the sender immediately. > Thank you. > > > ------------------------------ > THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS > CONFIDENTIAL and may contain information that is privileged and exempt from > disclosure under applicable law. If you are neither the intended recipient > nor responsible for delivering the message to the intended recipient, > please note that any dissemination, distribution, copying or the taking of > any action in reliance upon the message is strictly prohibited. If you have > received this communication in error, please notify the sender immediately. > Thank you. >