No streams do not do indexes automatically. It does have the ability to
chunk files near to block size before writing to hadoop, doing this does not
require indexing.

Indexing is a separate process that you'll need to run.

Cheers,
 Gerrit

On Tue, Apr 19, 2011 at 11:05 PM, Dmitriy Ryaboy <[email protected]> wrote:

> Scribe can also write lzo-compressed output.
>
> The indexing step still needs to be taken (Gerrit, does your bigstreams
> write out indexes automatically?).
>
> So our workflow is more like:
>
> 1) Scribe to hdfs with lzo compression
> 2) index
> 3) run pig queries over data with EB loaders.
>
> On Tue, Apr 19, 2011 at 12:48 PM, Gerrit Jansen van Vuuren <
> [email protected]> wrote:
>
> > Hi,
> >
> > Have a look at http://code.google.com/p/bigstreams/ and
> > http://code.google.com/p/hadoop-gpl-packing/.
> > If you configure bigstreams to use lzo, it will collect your log files
> from
> > servers and write it out plus load it to hadoop in lzo format.
> >
> > Cheers,
> >  Gerrit
> >
> > On Tue, Apr 19, 2011 at 9:44 PM, Chaitanya Sharma <[email protected]
> > >wrote:
> >
> > > Hi,
> > >
> > > I recently for Pig to work with Lzo compression, with pig loaders from
> > > Elephant Bird.
> > >
> > > But, from my understanding my work flow is turning out to be:
> > > Step 1 :  lzo-compress the raw input file.
> > > Step 2 :  put the compressed.lzo file to hdfs.
> > > Step 3 :  execute pig jobs with loaders from elephant-bird.
> > >
> > > Now, this looks to be an all manual workflow; needs a lot baby sitting.
> > >
> > > Please correct me if i'm wrong, but what I am wondering about is, if EB
> > or
> > > Hadoop-Lzo could automate Step #1, Step #2 and would not need manual
> > > intervention?
> > >
> > >
> > > Thanks,
> > > Chaitanya
> > >
> >
>

Reply via email to