Re: compressed/encrypted file

2008-06-05 Thread John Heidemann
arallel manner. However once we get bzip2 to work we could >split up the files as you are describing... We are actually working on a bzip2 codec, hopefully with split support, so hopefully something will be here by the end of summer. -John Heidemann

extracting input to a task from a (streaming) job?

2008-08-07 Thread John Heidemann
that actually does it? Or are there instructions for poking around on the compute nodes' local disks to assemble it by hand? Or better suggestions? It would be a real boon for people developing map and reduce user code. Thanks for any pointers. -John Heidemann

Re: extracting input to a task from a (streaming) job?

2008-08-07 Thread John Heidemann
On Thu, 07 Aug 2008 19:42:05 +0200, "Leon Mergen" wrote: >Hello John, > >On Thu, Aug 7, 2008 at 6:30 PM, John Heidemann <[EMAIL PROTECTED]> wrote: > >> >> I have a large Hadoop streaming job that generally works fine, >> but a few (2-4) of the ~3

Re: Hadoop and .tgz files

2008-12-02 Thread John Heidemann
12 I don't believe splitting of .tgz files is possible, something compressed with gzip can only be uncompressed from the beginning. -John Heidemann

Re: Strange behavior with bzip2 input files w/release 0.19.0

2008-12-05 Thread John Heidemann
is (I believe) fundamental to gzip where the decompression state is never checkpointed. This limitation is what prompted us to add support for bzip2 and bzip2 splitting, although splitting support is only in progress as Abdul said. -John Heidemann > >Alex > >On Thu, Dec 4, 2008

Re: bzip2 input format

2009-04-14 Thread John Heidemann
g/jira/browse/HADOOP-4012 ? -John Heidemann

Re: Are .bz2 extensions supported in Hadoop 18.3

2009-06-24 Thread John Heidemann
man > Not AFAIK, but we have added bzip2 support as of 0.19 (see JIRA HADOOP-3646), and have splitting support working (see JIRA HADOOP-4012) as a patch. Getting HADOOP-4012 committed has been painful, but it seems close. -John Heidemann

Re: Hadoop summit / workshop at Yahoo!

2008-02-21 Thread John Heidemann
On Wed, 20 Feb 2008 12:10:09 PST, "Ajay Anand" wrote: >The registration page for the Hadoop summit is now up: >http://developer.yahoo.com/hadoop/summit/ >... >Agenda: Ajay, when we talked about the summit on the phone, you were considering having a poster session. I don't see that listed. Shoul