Re: Will already sorted Mapper output improve speed of Sort in reducer?

2010-01-07 Thread Yongqiang He
The mapper output is sorted using the quick-sort at the mapper side (actually the sort algorithm can be pluggable). The reducer only needs to use a merge sort in order to reduce number of files. Right now Hadoop always run a sorter at the mapper side to sort map output. One interesting point is to

Will already sorted Mapper output improve speed of Sort in reducer?

2010-01-07 Thread Le Zhao
Hi, Does anybody know whether sorted Mapper output will decrease the Sort in the reduce phase? I'm teaching a class, and am curious to know how much of a difference will sorted vs. unsorted mapper output be. If the merge sort is implemented to deal with already sorted input, then I guess it

HBASE 0.20.2 and HADOOP 0.20.1

2010-01-07 Thread Mithila Nagendra
Hi all, I've installed hadoop 0.20.1 and Hbase 0.20.2, when I try to create a table at using the hbase shell I get the following exception. The table gets created anyway. Can this be prevented? How do I get rid of this exception? hbase(main):004:0> create 'blogposts1', 'post', 'image' 10/01/07 17

Re: debian package of hadoop

2010-01-07 Thread Edward Capriolo
As for your 2) to deploy small clusters I think that is outside the scope of packaging. Take mysql for example. you could configure mysql to be master/slave or even master/master. That is all done through the configuration not through the package. The hadoop configuration is very dependent on host

Re: debian package of hadoop

2010-01-07 Thread Jordà Polo
On Mon, Jan 04, 2010 at 12:37:48PM +, Steve Loughran wrote: > If you want "official" as in can say "Apache Hadoop" on it, then it > will need to be managed and released as an apache project. That > means somewhere in ASF SVN. If you want to cut your own, please give > it a different name to avo

Re: Multiple file output

2010-01-07 Thread Vijay
That's great news! Thanks guys! On Thu, Jan 7, 2010 at 11:11 AM, Aaron Kimball wrote: > Note that org.apache.hadoop.mapreduce.lib.output.MultipleOutputs is > scheduled for the next CDH 0.20 release -- ready "soon." > - Aaron > > 2010/1/6 Amareshwari Sri Ramadasu > > > No. It is part of branch 0

Custom Writer truncates output after some big number

2010-01-07 Thread bora
http://old.nabble.com/file/p27068391/ReachSum.java ReachSum.java Hi I have a problem printing one of my hadoop outputs. I have a class where I hold the output data See below. It gets collected at the reducer SummaryOut so; ... output.collect(new StringArrayWritable(key.content), so); But when th

isSplitable() deprecated

2010-01-07 Thread Ted Yu
According to: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/TextInputFormat.html#isSplitable%28org.apache.hadoop.fs.FileSystem,%20org.apache.hadoop.fs.Path%29 isSplitable() is deprecated. Which method should I use to replace it ? Thanks

What can cause: Map output copy failure

2010-01-07 Thread Mayuran Yogarajah
I'm seeing this error when a job runs: Shuffling 35338524 bytes (35338524 raw bytes) into RAM from attempt_201001051549_0036_m_03_0 Map output copy failure: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(Reduce

Other sources for hadoop api help

2010-01-07 Thread Raymond Jennings III
I am trying to develop some hadoop programs and I see that most of the examples included in the distribution are using deprecated classes and methods. Are there any other sources to learn about the api other than the javadocs, which for beginners trying to write hadoop programs, is not the best

Re: Multiple file output

2010-01-07 Thread Aaron Kimball
Note that org.apache.hadoop.mapreduce.lib.output.MultipleOutputs is scheduled for the next CDH 0.20 release -- ready "soon." - Aaron 2010/1/6 Amareshwari Sri Ramadasu > No. It is part of branch 0.21 onwards. For 0.20*, people can use old api > only, though JobConf is deprecated. > > -Amareshwari

Re: debian package, ivy problems

2010-01-07 Thread Allen Wittenauer
On 1/7/10 1:48 AM, "Thomas Koch" wrote: > I wrote my own ivysettings.xml[1] so that ivy won't go online, but rather uses > what's already installed as debian packages. Did build.properties with offline=true not work in this instance? (I'm not an ivy guy)

Hbase loading error -- Trailer 'header' is wrong; does the trailer size match content

2010-01-07 Thread Sriram Muthuswamy Chittathoor
Hi: I am trying to run a MR job to output HFiles directly containing 10 million records (very simple 1 column family and very small). The job completes with some mention about killed jobs (reduce Failed/Killed Task Attempts > 0) . Then I use the script loadtable.rb to load my hfiles into hbase a

hostname requirement, was: debian package of hadoop

2010-01-07 Thread Thomas Koch
Hi Todd, > > The SNN shows: > > java.io.FileNotFoundException: > > > > http://192.168.122.166:50070/getimage?putimage=1&port=50090&machine=127.0 > >.1.1&token=-18:737152035:0:126219599:1262194649873 at > > > > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnect > >ion.jav

debian package, ivy problems

2010-01-07 Thread Thomas Koch
Hi, I started the debian packaging at http://git.debian.org/?p=users/thkoch-guest/hadoop.git I wrote my own ivysettings.xml[1] so that ivy won't go online, but rather uses what's already installed as debian packages. However ivy spits some errors at me like: commons-logging#commons-logging;wo