In hive i use SET mapreduce.input.fileinputformat.split.maxsize=134217728; but not effect and i found when use
LOAD DATA INPATH '/data_split/data_rowkey.lzo' OVERWRITE INTO TABLE data_zh The hdfs data move to hive directory i CREATE EXTERNAL TABLE but issue is data_rowkey.lzo.index is also exist hdfs /data_split/ directory .actually data move to hive directory , index file in hdfs directory ,they are not in the same directory 2013/8/22 Sanjay Subramanian <sanjay.subraman...@wizecommerce.com> > Hi > > Try this setting in your hive query > > SET mapreduce.input.fileinputformat.split.maxsize=<some bytes>; > > If u set this value "low" then the MR job will use this size to split > the input LZO files and u will get multiple mappers (and make sure the > input LZO files are indexed I.e. .LZO.INDEX files are created) > > sanjay > > > From: Edward Capriolo <edlinuxg...@gmail.com> > Reply-To: "user@hive.apache.org" <user@hive.apache.org> > Date: Wednesday, August 21, 2013 10:43 AM > To: "user@hive.apache.org" <user@hive.apache.org> > Subject: Re: only one mapper > > LZO files are only splittable if you index them. Sequence files > compresses with LZO are splittable without being indexed. > > Snappy + SequenceFile is a better option then LZO. > > > On Wed, Aug 21, 2013 at 1:39 PM, Igor Tatarinov <i...@decide.com> wrote: > >> LZO files are combinable so check your max split setting. >> >> http://mail-archives.apache.org/mod_mbox/hive-user/201107.mbox/%3c4e328964.7000...@gmail.com%3E >> >> igor >> decide.com >> >> >> >> On Wed, Aug 21, 2013 at 2:17 AM, 闫昆 <yankunhad...@gmail.com> wrote: >> >>> hi all when i use hive >>> hive job make only one mapper actually my file split 18 block my block >>> size is 128MB and data size 2GB >>> i use lzo compression and create file.lzo and make index file.lzo.index >>> i use hive 0.10.0 >>> >>> Total MapReduce jobs = 1 >>> Launching Job 1 out of 1 >>> Number of reduce tasks is set to 0 since there's no reduce operator >>> Cannot run job locally: Input Size (= 2304560827) is larger than >>> hive.exec.mode.local.auto.inputbytes.max (= 134217728) >>> Starting Job = job_1377071515613_0003, Tracking URL = >>> http://hydra0001:8088/proxy/application_1377071515613_0003/ >>> Kill Command = /opt/module/hadoop-2.0.0-cdh4.3.0/bin/hadoop job -kill >>> job_1377071515613_0003 >>> Hadoop job information for Stage-1: number of mappers: 1; number of >>> reducers: 0 >>> 2013-08-21 16:44:30,237 Stage-1 map = 0%, reduce = 0% >>> 2013-08-21 16:44:40,495 Stage-1 map = 2%, reduce = 0%, Cumulative CPU >>> 6.81 sec >>> 2013-08-21 16:44:41,710 Stage-1 map = 2%, reduce = 0%, Cumulative CPU >>> 6.81 sec >>> 2013-08-21 16:44:42,919 Stage-1 map = 2%, reduce = 0%, Cumulative CPU >>> 6.81 sec >>> 2013-08-21 16:44:44,117 Stage-1 map = 3%, reduce = 0%, Cumulative CPU >>> 9.95 sec >>> 2013-08-21 16:44:45,333 Stage-1 map = 3%, reduce = 0%, Cumulative CPU >>> 9.95 sec >>> 2013-08-21 16:44:46,530 Stage-1 map = 5%, reduce = 0%, Cumulative CPU >>> 13.0 sec >>> >>> -- >>> >>> In the Hadoop world, I am just a novice, explore the entire Hadoop >>> ecosystem, I hope one day I can contribute their own code >>> >>> YanBit >>> yankunhad...@gmail.com >>> >>> >> > > CONFIDENTIALITY NOTICE > ====================== > This email message and any attachments are for the exclusive use of the > intended recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient, please contact the > sender by reply email and destroy all copies of the original message along > with any attachments, from your computer system. If you are the intended > recipient, please be advised that the content of this message is subject to > access, review and disclosure by the sender's Email System Administrator. > -- In the Hadoop world, I am just a novice, explore the entire Hadoop ecosystem, I hope one day I can contribute their own code YanBit yankunhad...@gmail.com