The mapper output is sorted using the quick-sort at the mapper side
(actually the sort algorithm can be pluggable). The reducer only needs to
use a merge sort in order to reduce number of files.
Right now Hadoop always run a sorter at the mapper side to sort map output.
One interesting point is to
Hi,
Does anybody know whether sorted Mapper output will decrease the Sort in
the reduce phase?
I'm teaching a class, and am curious to know how much of a difference
will sorted vs. unsorted mapper output be. If the merge sort is
implemented to deal with already sorted input, then I guess it
Hi all,
I've installed hadoop 0.20.1 and Hbase 0.20.2, when I try to create a table
at using the hbase shell I get the following exception. The table gets
created anyway. Can this be prevented? How do I get rid of this exception?
hbase(main):004:0> create 'blogposts1', 'post', 'image'
10/01/07 17
As for your
2) to deploy small clusters
I think that is outside the scope of packaging. Take mysql for
example. you could configure mysql to be master/slave or even
master/master. That is all done through the configuration not through
the package.
The hadoop configuration is very dependent on host
On Mon, Jan 04, 2010 at 12:37:48PM +, Steve Loughran wrote:
> If you want "official" as in can say "Apache Hadoop" on it, then it
> will need to be managed and released as an apache project. That
> means somewhere in ASF SVN. If you want to cut your own, please give
> it a different name to avo
That's great news! Thanks guys!
On Thu, Jan 7, 2010 at 11:11 AM, Aaron Kimball wrote:
> Note that org.apache.hadoop.mapreduce.lib.output.MultipleOutputs is
> scheduled for the next CDH 0.20 release -- ready "soon."
> - Aaron
>
> 2010/1/6 Amareshwari Sri Ramadasu
>
> > No. It is part of branch 0
http://old.nabble.com/file/p27068391/ReachSum.java ReachSum.java
Hi
I have a problem printing one of my hadoop outputs. I have a class where I
hold the output data
See below. It gets collected at the reducer
SummaryOut so; ...
output.collect(new StringArrayWritable(key.content), so);
But when th
According to:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/TextInputFormat.html#isSplitable%28org.apache.hadoop.fs.FileSystem,%20org.apache.hadoop.fs.Path%29
isSplitable() is deprecated.
Which method should I use to replace it ?
Thanks
I'm seeing this error when a job runs:
Shuffling 35338524 bytes (35338524 raw bytes) into RAM from
attempt_201001051549_0036_m_03_0
Map output copy failure: java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(Reduce
I am trying to develop some hadoop programs and I see that most of the examples
included in the distribution are using deprecated classes and methods. Are
there any other sources to learn about the api other than the javadocs, which
for beginners trying to write hadoop programs, is not the best
Note that org.apache.hadoop.mapreduce.lib.output.MultipleOutputs is
scheduled for the next CDH 0.20 release -- ready "soon."
- Aaron
2010/1/6 Amareshwari Sri Ramadasu
> No. It is part of branch 0.21 onwards. For 0.20*, people can use old api
> only, though JobConf is deprecated.
>
> -Amareshwari
On 1/7/10 1:48 AM, "Thomas Koch" wrote:
> I wrote my own ivysettings.xml[1] so that ivy won't go online, but rather uses
> what's already installed as debian packages.
Did build.properties with offline=true not work in this instance?
(I'm not an ivy guy)
Hi:
I am trying to run a MR job to output HFiles directly containing 10
million records (very simple 1 column family and very small). The job
completes with some mention about killed jobs (reduce Failed/Killed
Task Attempts > 0) . Then I use the script loadtable.rb to load my
hfiles into hbase a
Hi Todd,
> > The SNN shows:
> > java.io.FileNotFoundException:
> >
> > http://192.168.122.166:50070/getimage?putimage=1&port=50090&machine=127.0
> >.1.1&token=-18:737152035:0:126219599:1262194649873 at
> >
> > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnect
> >ion.jav
Hi,
I started the debian packaging at
http://git.debian.org/?p=users/thkoch-guest/hadoop.git
I wrote my own ivysettings.xml[1] so that ivy won't go online, but rather uses
what's already installed as debian packages. However ivy spits some errors at
me like:
commons-logging#commons-logging;wo
15 matches
Mail list logo