> Unless my understanding is total wrong, I don't know how reducing block
> size will help in this case.
>
> Thanks
>
> Yong
>
> > Subject: Re: Loader for small files
> > From: davidlabarb...@localresponse.com
> > Date: Mon, 11 Feb 2013 15:38:54 -0500
&g
derstanding is total wrong, I don't know how reducing block size
will help in this case.
Thanks
Yong
> Subject: Re: Loader for small files
> From: davidlabarb...@localresponse.com
> Date: Mon, 11 Feb 2013 15:38:54 -0500
> CC: user@hadoop.apache.org
> To: u...@pig.apache.org
>
What process creates the data in HDFS? You should be able to set the block size
there and avoid the copy.
I would test the dfs.block.size on the copy and see if you get the mapper split
you want before worrying about optimizing.
David
On Feb 11, 2013, at 2:10 PM, Something Something
wrote:
David: Your suggestion would add an additional step of copying data from
one place to another. Not bad, but not ideal. Is there no way to avoid
copying of data?
BTW, we have tried changing the following options to no avail :(
set pig.splitCombination false;
& a few other 'dfs' options given b
You could store your data in smaller block sizes. Do something like
hadoop fs HADOOP_OPTS="-Ddfs.block.size=1048576 -Dfs.local.block.size=1048576"
-cp /org-input /small-block-input
You might only need one of those parameters. You can verify the block size with
hadoop fsck /small-block-input
In yo
Sorry.. Moving 'hbase' mailing list to BCC 'cause this is not related to
HBase. Adding 'hadoop' user group.
On Mon, Feb 11, 2013 at 10:22 AM, Something Something <
mailinglist...@gmail.com> wrote:
> Hello,
>
> We are running into performance issues with Pig/Hadoop because our input
> files are s