Re: Reduce not completing

2008-12-23 Thread Amareshwari Sriramadasu
I couldn't get much from the logs why it is so. For reporting status, you can write to stderr from your script. The format should be reporter:status:. If program emits such lines in stderr, the framework will treat them as status report. Hope this clarifies. Thanks Amareshwari Rick Hangartne

Re: How to coordinate nodes of different computing powers in a same cluster?

2008-12-23 Thread Devaraj Das
You can enable speculative execution for your jobs. On 12/24/08 10:25 AM, "Jeremy Chow" wrote: > Hi list, > I've come up against a scenario like this, to finish a same task, one of my > hadoop cluster only needs 5 seconds, and another one needs more than 2 > minutes. > It's a common phenomenon

How to coordinate nodes of different computing powers in a same cluster?

2008-12-23 Thread Jeremy Chow
Hi list, I've come up against a scenario like this, to finish a same task, one of my hadoop cluster only needs 5 seconds, and another one needs more than 2 minutes. It's a common phenomenon that will decrease the parallelism of our system due to the faster one will wait the slower one. How to coor

Re: Reduce not completing

2008-12-23 Thread Rick Hangartner
Hi Amareshwari, This may or may not be helpful. Here's an example of three runs in rapid succession. The first and last completed without any problems. The middle one completed in this case, but the log has three exceptions of the type we reported (We think a fourth exception would have

A question about MultipleOutputFormat

2008-12-23 Thread Saptarshi Guha
Hello, MultipleOutputFormat is a very good idea. Thanks. I have a question, from the web page , "The reducer wants to write data to different files depending on the actual keys" .. and values. Examining, TestMultipleTextOutputFormat, class KeyBasedMultipleTextOutputFormat extends MultipleTextOutpu

Re: Reduce not completing

2008-12-23 Thread Rick Hangartner
Hi Amareshwari, The "stream.non.zero.exit.status.is failure" is the default (which the docs indicate is 'True'). We don't think the problem is the reducer script per se: Under one circumstance we are investigating further it arises when the reducer script does nothing but copy stdin to st

Re: Run Map-Reduce multiple times

2008-12-23 Thread Jason Venner
in 19 there is a chaining facility, I haven't looked at it yet, but it may provide an alternative to the rather standard pattern of looping. You may also what to check what mahout is doing as it is a common problem in that space. Delip Rao wrote: Thanks Chris! I ended up doing something simi

Re: Copy-rate of reducers decreases over time

2008-12-23 Thread Jason Venner
The copy rate for the reduces is throttled by the availability of the data from the maps. If the map data is not available yet, the effective copy rate goes toward 0. patek tek wrote: Hello, I have been running experiments with Hadoop and noticed that the copy-rate of reducers descreases over

hadoop 1.18 version install on single node with two disks producing createBlockOutputStream java.io.IOException - help needed

2008-12-23 Thread Akturan, Cagdas
Hi all, I am new to this list so please forgive me if this is not the right way or format to ask for help. I installed hadoop version hadoop-0.18.1 with single node and single disk and everything was working fine. When I added one more disk I started getting the errors copied below. I tried bala

Re: Classes Not Found even when classpath is mentioned (Starting mapreduce from another app)

2008-12-23 Thread Jason Venner
Yes this will work. You will need to configure the class path to include that directory. The Tasktracker's really only have the classpath as setup by conf/hadoop-env.sh, and the Tasktracker$Child's have that classpath + the unpacked distributed cache directory. Saptarshi Guha wrote: Hello, Wh

Data is not getting written into DFS in 18.2

2008-12-23 Thread Fuchs, Michael
Hi, I run into some issues with Hadoop 18.2 on my Linux box: The jobs executes without any complains and they are listed in the succeeded list but there is no output data beside the "_logs" directory. The same code works with .17.2.1 Here are some sections of the logs: [logfile] had...@

Infinite loop in a DataNode

2008-12-23 Thread Jean-Adrien
Hello, I'm testing a cluster with Hadoop 0.18.1 / HBase 0.18.0. These last days a problem arises with my hdfs: My topology is 4 nodes. 3 nodes run DataNode and RegionServer, and one runs HBase master, NameNode and Secondary NameNode. The cluster works for some hours, then one of the DataNode fr

Re: how to pass an object to mapper

2008-12-23 Thread Enis Soztutar
There are several ways you can pass static information to tasks in Hadoop. The first is to store it in conf via DefaultStringifier, which needs the object to be serialized either through Writable or Serializable interfaces. Second way would be to save/serialize the data to a file and send it vi

Re: Reduce not completing

2008-12-23 Thread Amareshwari Sriramadasu
You can report status from streaming job by emitting reporter:status: in stderr. See documentation @ http://hadoop.apache.org/core/docs/r0.18.2/streaming.html#How+do+I+update+status+in+streaming+applications%3F But from the exception trace, it doesn't look like lack of report(timeout). The tr

how to pass an object to mapper

2008-12-23 Thread forbbs forbbs
It seems that JobConf doesn't help. Do I have to write the object into DFS?