https://issues.apache.org/jira/browse/MAPREDUCE-655 fixed in version 0.21.0
On 11/26/09 9:43 PM, Matthias Scherer matthias.sche...@1und1.de wrote:
Sorry, but I can't find it in the version control system for release 0.20.1:
Thanks Rekha, I was missing the new library
(hadoop-0.20.1-hdfs-core.jar) in my client.
It seems to run a little further but I'm now getting a
ClassCastException returned by the mapper. Note, this worked with the
0.19 load, so I'm assuming there's something additional in the
configuration that
On Fri, Nov 27, 2009 at 10:46 AM, Arv Mistry a...@kindsight.net wrote:
Thanks Rekha, I was missing the new library
(hadoop-0.20.1-hdfs-core.jar) in my client.
It seems to run a little further but I'm now getting a
ClassCastException returned by the mapper. Note, this worked with the
0.19
When you set up the Job object, do you call job.setJarByClass(Map.class)?
That will tell Hadoop which jar file to ship with the job and to use for
classloading in your code.
- Aaron
On Thu, Nov 26, 2009 at 11:56 PM, aa...@buffalo.edu wrote:
Hi,
I am running the job from command line. The
You are always free to run with compression disabled. But in many production
situations, space or performance concerns dictate that all data sets are
stored compressed, so I think Tim was assuming that you might be operating
in such an environment -- in which case, you'd only need things to appear
You can always do
hadoop fs -text filename
This will 'cat' the file for you, and decompress it if necessary.
On Thu, Nov 26, 2009 at 7:59 PM, Mark Kerzner markkerz...@gmail.com wrote:
It worked!
But why is it for testing? I only have one job, so I need by related as
text, can I use this
Thank you, guys, for your very useful answers
Mark
On Fri, Nov 27, 2009 at 12:44 PM, Aaron Kimball aa...@cloudera.com wrote:
You are always free to run with compression disabled. But in many
production
situations, space or performance concerns dictate that all data sets are
stored
Ok. I have set the number on maps to about 1760 (11 nodes * 16
cores/node * 10 as recommended by Hadoop documentation) and my job still
takes several hours to run instead of one.
Can be the overhead added by Hadoop that big? I mean I have over 3
small tasks (about one minute), each one
What does the data look like?
You mention 30k records, is that for 10MB or for 600MB, or do you have a
constant 30k records with vastly varying file sizes?
If the data is 10MB and you have 30k records, and it takes ~2 mins to
process each record, I'd suggest using map to distribute the data
3 records in 10MB files.
Files can vary and the number of records also can vary.
If the data is 10MB and you have 30k records, and it takes ~2 mins to
process each record, I'd suggest using map to distribute the data across
several reducers then do the actual processing on reduce.
Aaron Kimball wrote:
(Note: this is a tasktracker setting, not a job setting. you'll need to set
this on every
node, then restart the mapreduce cluster to take effect.)
Ok. And here is my mistake. I set this to 16 only on the main node not
also on data nodes. Thanks a lot!!
Of
Hello,
Let my Key-Value be something like BinaryWritables (my own class, but
something like this). Is there a way to create the Sequence File
composed of several such key - values, without using Java?
Background:
I create objects using protocol buffers, my key and values are
serialized
On Fri, Nov 27, 2009 at 7:07 PM, Saptarshi Guha saptarshi.g...@gmail.comwrote:
Let my Key-Value be something like BinaryWritables (my own class, but
something like this). Is there a way to create the Sequence File
composed of several such key - values, without using Java?
There is not a C++
13 matches
Mail list logo