Re: Loader for small files

2013-02-12 Thread Something Something
> Unless my understanding is total wrong, I don't know how reducing block > size will help in this case. > > Thanks > > Yong > > > Subject: Re: Loader for small files > > From: davidlabarb...@localresponse.com > > Date: Mon, 11 Feb 2013 15:38:54 -0500 &g

RE: Loader for small files

2013-02-12 Thread java8964 java8964
derstanding is total wrong, I don't know how reducing block size will help in this case. Thanks Yong > Subject: Re: Loader for small files > From: davidlabarb...@localresponse.com > Date: Mon, 11 Feb 2013 15:38:54 -0500 > CC: user@hadoop.apache.org > To: u...@pig.apache.org >

Re: Loader for small files

2013-02-11 Thread David LaBarbera
What process creates the data in HDFS? You should be able to set the block size there and avoid the copy. I would test the dfs.block.size on the copy and see if you get the mapper split you want before worrying about optimizing. David On Feb 11, 2013, at 2:10 PM, Something Something wrote:

Re: Loader for small files

2013-02-11 Thread Something Something
David: Your suggestion would add an additional step of copying data from one place to another. Not bad, but not ideal. Is there no way to avoid copying of data? BTW, we have tried changing the following options to no avail :( set pig.splitCombination false; & a few other 'dfs' options given b

Re: Loader for small files

2013-02-11 Thread David LaBarbera
You could store your data in smaller block sizes. Do something like hadoop fs HADOOP_OPTS="-Ddfs.block.size=1048576 -Dfs.local.block.size=1048576" -cp /org-input /small-block-input You might only need one of those parameters. You can verify the block size with hadoop fsck /small-block-input In yo

Re: Loader for small files

2013-02-11 Thread Something Something
Sorry.. Moving 'hbase' mailing list to BCC 'cause this is not related to HBase. Adding 'hadoop' user group. On Mon, Feb 11, 2013 at 10:22 AM, Something Something < mailinglist...@gmail.com> wrote: > Hello, > > We are running into performance issues with Pig/Hadoop because our input > files are s