In hive i use SET mapreduce.input.fileinputformat.split.maxsize=134217728;
but not effect and i found when use

LOAD DATA INPATH  '/data_split/data_rowkey.lzo'

OVERWRITE INTO TABLE data_zh

The hdfs data move to hive directory i  CREATE EXTERNAL TABLE but
issue is data_rowkey.lzo.index
is also exist hdfs /data_split/ directory .actually data move to hive
directory , index file in hdfs directory ,they are not in the same directory


2013/8/22 Sanjay Subramanian <sanjay.subraman...@wizecommerce.com>

>  Hi
>
>  Try this setting in your hive query
>
>  SET mapreduce.input.fileinputformat.split.maxsize=<some bytes>;
>
>  If u set this value "low" then the MR job will use this size to split
> the input LZO files and u will get multiple mappers (and make sure the
> input LZO files are indexed I.e. .LZO.INDEX files are created)
>
>  sanjay
>
>
>   From: Edward Capriolo <edlinuxg...@gmail.com>
> Reply-To: "user@hive.apache.org" <user@hive.apache.org>
> Date: Wednesday, August 21, 2013 10:43 AM
> To: "user@hive.apache.org" <user@hive.apache.org>
> Subject: Re: only one mapper
>
>   LZO files are only splittable if you index them. Sequence files
> compresses with LZO are splittable without being indexed.
>
>  Snappy + SequenceFile is a better option then LZO.
>
>
> On Wed, Aug 21, 2013 at 1:39 PM, Igor Tatarinov <i...@decide.com> wrote:
>
>>  LZO files are combinable so check your max split setting.
>>
>> http://mail-archives.apache.org/mod_mbox/hive-user/201107.mbox/%3c4e328964.7000...@gmail.com%3E
>>
>>  igor
>> decide.com
>>
>>
>>
>> On Wed, Aug 21, 2013 at 2:17 AM, 闫昆 <yankunhad...@gmail.com> wrote:
>>
>>>  hi all when i use hive
>>> hive job make only one mapper actually my file split 18 block my block
>>> size is 128MB and data size 2GB
>>> i use lzo compression and create file.lzo and make index file.lzo.index
>>> i use hive 0.10.0
>>>
>>>  Total MapReduce jobs = 1
>>> Launching Job 1 out of 1
>>> Number of reduce tasks is set to 0 since there's no reduce operator
>>> Cannot run job locally: Input Size (= 2304560827) is larger than
>>> hive.exec.mode.local.auto.inputbytes.max (= 134217728)
>>> Starting Job = job_1377071515613_0003, Tracking URL =
>>> http://hydra0001:8088/proxy/application_1377071515613_0003/
>>> Kill Command = /opt/module/hadoop-2.0.0-cdh4.3.0/bin/hadoop job  -kill
>>> job_1377071515613_0003
>>> Hadoop job information for Stage-1: number of mappers: 1; number of
>>> reducers: 0
>>> 2013-08-21 16:44:30,237 Stage-1 map = 0%,  reduce = 0%
>>> 2013-08-21 16:44:40,495 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>>> 6.81 sec
>>> 2013-08-21 16:44:41,710 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>>> 6.81 sec
>>> 2013-08-21 16:44:42,919 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>>> 6.81 sec
>>> 2013-08-21 16:44:44,117 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU
>>> 9.95 sec
>>> 2013-08-21 16:44:45,333 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU
>>> 9.95 sec
>>> 2013-08-21 16:44:46,530 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU
>>> 13.0 sec
>>>
>>>  --
>>>
>>> In the Hadoop world, I am just a novice, explore the entire Hadoop
>>> ecosystem, I hope one day I can contribute their own code
>>>
>>> YanBit
>>> yankunhad...@gmail.com
>>>
>>>
>>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>



-- 

In the Hadoop world, I am just a novice, explore the entire Hadoop
ecosystem, I hope one day I can contribute their own code

YanBit
yankunhad...@gmail.com

Reply via email to