Re: AW: KeyValueTextInputFormat and Hadoop 0.20.1

2009-11-27 Thread Rekha Joshi
https://issues.apache.org/jira/browse/MAPREDUCE-655 fixed in version 0.21.0 On 11/26/09 9:43 PM, Matthias Scherer matthias.sche...@1und1.de wrote: Sorry, but I can't find it in the version control system for release 0.20.1:

RE: Hadoop 0.20 map/reduce Failing for old API

2009-11-27 Thread Arv Mistry
Thanks Rekha, I was missing the new library (hadoop-0.20.1-hdfs-core.jar) in my client. It seems to run a little further but I'm now getting a ClassCastException returned by the mapper. Note, this worked with the 0.19 load, so I'm assuming there's something additional in the configuration that

Re: Hadoop 0.20 map/reduce Failing for old API

2009-11-27 Thread Edward Capriolo
On Fri, Nov 27, 2009 at 10:46 AM, Arv Mistry a...@kindsight.net wrote: Thanks Rekha, I was missing the new library (hadoop-0.20.1-hdfs-core.jar) in my client. It seems to run a little further but I'm now getting a ClassCastException returned by the mapper. Note, this worked with the 0.19

Re: Re: Doubt in Hadoop

2009-11-27 Thread Aaron Kimball
When you set up the Job object, do you call job.setJarByClass(Map.class)? That will tell Hadoop which jar file to ship with the job and to use for classloading in your code. - Aaron On Thu, Nov 26, 2009 at 11:56 PM, aa...@buffalo.edu wrote: Hi, I am running the job from command line. The

Re: part-00000.deflate as output

2009-11-27 Thread Aaron Kimball
You are always free to run with compression disabled. But in many production situations, space or performance concerns dictate that all data sets are stored compressed, so I think Tim was assuming that you might be operating in such an environment -- in which case, you'd only need things to appear

Re: part-00000.deflate as output

2009-11-27 Thread Patrick Angeles
You can always do hadoop fs -text filename This will 'cat' the file for you, and decompress it if necessary. On Thu, Nov 26, 2009 at 7:59 PM, Mark Kerzner markkerz...@gmail.com wrote: It worked! But why is it for testing? I only have one job, so I need by related as text, can I use this

Re: part-00000.deflate as output

2009-11-27 Thread Mark Kerzner
Thank you, guys, for your very useful answers Mark On Fri, Nov 27, 2009 at 12:44 PM, Aaron Kimball aa...@cloudera.com wrote: You are always free to run with compression disabled. But in many production situations, space or performance concerns dictate that all data sets are stored

Re: Processing 10MB files in Hadoop

2009-11-27 Thread CubicDesign
Ok. I have set the number on maps to about 1760 (11 nodes * 16 cores/node * 10 as recommended by Hadoop documentation) and my job still takes several hours to run instead of one. Can be the overhead added by Hadoop that big? I mean I have over 3 small tasks (about one minute), each one

Re: Processing 10MB files in Hadoop

2009-11-27 Thread Patrick Angeles
What does the data look like? You mention 30k records, is that for 10MB or for 600MB, or do you have a constant 30k records with vastly varying file sizes? If the data is 10MB and you have 30k records, and it takes ~2 mins to process each record, I'd suggest using map to distribute the data

Re: Processing 10MB files in Hadoop

2009-11-27 Thread CubicDesign
3 records in 10MB files. Files can vary and the number of records also can vary. If the data is 10MB and you have 30k records, and it takes ~2 mins to process each record, I'd suggest using map to distribute the data across several reducers then do the actual processing on reduce.

Re: Processing 10MB files in Hadoop

2009-11-27 Thread CubicDesign
Aaron Kimball wrote: (Note: this is a tasktracker setting, not a job setting. you'll need to set this on every node, then restart the mapreduce cluster to take effect.) Ok. And here is my mistake. I set this to 16 only on the main node not also on data nodes. Thanks a lot!! Of

Creating Sequence File in C++

2009-11-27 Thread Saptarshi Guha
Hello, Let my Key-Value be something like BinaryWritables (my own class, but something like this). Is there a way to create the Sequence File composed of several such key - values, without using Java? Background: I create objects using protocol buffers, my key and values are serialized

Re: Creating Sequence File in C++

2009-11-27 Thread Owen O'Malley
On Fri, Nov 27, 2009 at 7:07 PM, Saptarshi Guha saptarshi.g...@gmail.comwrote: Let my Key-Value be something like BinaryWritables (my own class, but something like this). Is there a way to create the Sequence File composed of several such key - values, without using Java? There is not a C++