Re: save several 64MB files in Pig Latin

Bertrand Dechoux Sun, 09 Jun 2013 22:30:42 -0700

The purpose is not really clear. But if you are looking for how to specify
multiple Reducer task, it is well explained in the documentation.
http://pig.apache.org/docs/r0.11.1/perf.html#parallel


You will get one file per reducer. It is up to you to specify the right
number but be careful of not falling into the small files problem in the
end.
http://blog.cloudera.com/blog/2009/02/the-small-files-problem/

If you have specific question on HDFS itself or pig optimisation, you
should provide more explanation.
(64MB is the default block size for HDFS)

Regard

Bertrand


On Mon, Jun 10, 2013 at 6:53 AM, Pedro Sá da Costa <psdc1...@gmail.com>wrote:

> I said 64MB, but it can be 128MB, or 5KB. It doesn't matter the number. I
> just want to extract data and put into several files with specific size.
> Basically, I am doing a cat to a big txt file, and I want to split the
> content into multiple files with a fixed size.
>
>
> On 7 June 2013 10:14, Johnny Zhang <xiao...@cloudera.com> wrote:
>
> > Pedro, you can try Piggybank MultiStorage, which split results into
> > different dir/files by specific index attribute. But not sure how it can
> > make sure the file size is 64MB. Why 64MB specifically? what's the
> > connection between your data and 64MB?
> >
> > Johnny
> >
> >
> > On Fri, Jun 7, 2013 at 12:56 AM, Pedro Sá da Costa <psdc1...@gmail.com
> > >wrote:
> >
> > > I am using the instruction:
> > >
> > > store A into 'result-australia-0' using PigStorage('\t');
> > >
> > > to store the data in HDFS. But the problem is that, this creates 1 file
> > > with 500MB of size. Instead, want to save several 64MB files. How I do
> > > this?
> > >
> > > --
> > > Best regards,
> > >
> >
>
>
>
> --
> Best regards,
>



-- 
Bertrand Dechoux

Re: save several 64MB files in Pig Latin

Reply via email to