No streams do not do indexes automatically. It does have the ability to chunk files near to block size before writing to hadoop, doing this does not require indexing.
Indexing is a separate process that you'll need to run. Cheers, Gerrit On Tue, Apr 19, 2011 at 11:05 PM, Dmitriy Ryaboy <[email protected]> wrote: > Scribe can also write lzo-compressed output. > > The indexing step still needs to be taken (Gerrit, does your bigstreams > write out indexes automatically?). > > So our workflow is more like: > > 1) Scribe to hdfs with lzo compression > 2) index > 3) run pig queries over data with EB loaders. > > On Tue, Apr 19, 2011 at 12:48 PM, Gerrit Jansen van Vuuren < > [email protected]> wrote: > > > Hi, > > > > Have a look at http://code.google.com/p/bigstreams/ and > > http://code.google.com/p/hadoop-gpl-packing/. > > If you configure bigstreams to use lzo, it will collect your log files > from > > servers and write it out plus load it to hadoop in lzo format. > > > > Cheers, > > Gerrit > > > > On Tue, Apr 19, 2011 at 9:44 PM, Chaitanya Sharma <[email protected] > > >wrote: > > > > > Hi, > > > > > > I recently for Pig to work with Lzo compression, with pig loaders from > > > Elephant Bird. > > > > > > But, from my understanding my work flow is turning out to be: > > > Step 1 : lzo-compress the raw input file. > > > Step 2 : put the compressed.lzo file to hdfs. > > > Step 3 : execute pig jobs with loaders from elephant-bird. > > > > > > Now, this looks to be an all manual workflow; needs a lot baby sitting. > > > > > > Please correct me if i'm wrong, but what I am wondering about is, if EB > > or > > > Hadoop-Lzo could automate Step #1, Step #2 and would not need manual > > > intervention? > > > > > > > > > Thanks, > > > Chaitanya > > > > > >
