Re: Pig Lzo Workflow.

Gerrit Jansen van Vuuren Tue, 19 Apr 2011 14:22:18 -0700

No streams do not do indexes automatically. It does have the ability to
chunk files near to block size before writing to hadoop, doing this does not
require indexing.


Indexing is a separate process that you'll need to run.

Cheers,
 Gerrit

On Tue, Apr 19, 2011 at 11:05 PM, Dmitriy Ryaboy <[email protected]> wrote:

> Scribe can also write lzo-compressed output.
>
> The indexing step still needs to be taken (Gerrit, does your bigstreams
> write out indexes automatically?).
>
> So our workflow is more like:
>
> 1) Scribe to hdfs with lzo compression
> 2) index
> 3) run pig queries over data with EB loaders.
>
> On Tue, Apr 19, 2011 at 12:48 PM, Gerrit Jansen van Vuuren <
> [email protected]> wrote:
>
> > Hi,
> >
> > Have a look at http://code.google.com/p/bigstreams/ and
> > http://code.google.com/p/hadoop-gpl-packing/.
> > If you configure bigstreams to use lzo, it will collect your log files
> from
> > servers and write it out plus load it to hadoop in lzo format.
> >
> > Cheers,
> >  Gerrit
> >
> > On Tue, Apr 19, 2011 at 9:44 PM, Chaitanya Sharma <[email protected]
> > >wrote:
> >
> > > Hi,
> > >
> > > I recently for Pig to work with Lzo compression, with pig loaders from
> > > Elephant Bird.
> > >
> > > But, from my understanding my work flow is turning out to be:
> > > Step 1 :  lzo-compress the raw input file.
> > > Step 2 :  put the compressed.lzo file to hdfs.
> > > Step 3 :  execute pig jobs with loaders from elephant-bird.
> > >
> > > Now, this looks to be an all manual workflow; needs a lot baby sitting.
> > >
> > > Please correct me if i'm wrong, but what I am wondering about is, if EB
> > or
> > > Hadoop-Lzo could automate Step #1, Step #2 and would not need manual
> > > intervention?
> > >
> > >
> > > Thanks,
> > > Chaitanya
> > >
> >
>

Re: Pig Lzo Workflow.

Reply via email to