The purpose is not really clear. But if you are looking for how to specify multiple Reducer task, it is well explained in the documentation. http://pig.apache.org/docs/r0.11.1/perf.html#parallel
You will get one file per reducer. It is up to you to specify the right number but be careful of not falling into the small files problem in the end. http://blog.cloudera.com/blog/2009/02/the-small-files-problem/ If you have specific question on HDFS itself or pig optimisation, you should provide more explanation. (64MB is the default block size for HDFS) Regard Bertrand On Mon, Jun 10, 2013 at 6:53 AM, Pedro Sá da Costa <psdc1...@gmail.com>wrote: > I said 64MB, but it can be 128MB, or 5KB. It doesn't matter the number. I > just want to extract data and put into several files with specific size. > Basically, I am doing a cat to a big txt file, and I want to split the > content into multiple files with a fixed size. > > > On 7 June 2013 10:14, Johnny Zhang <xiao...@cloudera.com> wrote: > > > Pedro, you can try Piggybank MultiStorage, which split results into > > different dir/files by specific index attribute. But not sure how it can > > make sure the file size is 64MB. Why 64MB specifically? what's the > > connection between your data and 64MB? > > > > Johnny > > > > > > On Fri, Jun 7, 2013 at 12:56 AM, Pedro Sá da Costa <psdc1...@gmail.com > > >wrote: > > > > > I am using the instruction: > > > > > > store A into 'result-australia-0' using PigStorage('\t'); > > > > > > to store the data in HDFS. But the problem is that, this creates 1 file > > > with 500MB of size. Instead, want to save several 64MB files. How I do > > > this? > > > > > > -- > > > Best regards, > > > > > > > > > -- > Best regards, > -- Bertrand Dechoux