Re: Hadoop streaming job issue

2009-11-16 Thread Jason Venner
There is a very clear picture in chapter 8 of pro hadoop, on all of the separators for streaming jobs. On Tue, Nov 10, 2009 at 6:53 AM, wd wrote: > You mean the ^A ? > I tried \u0001 and \x01, the streaming job recognise it as a string, not > ^A.. > > :( > > 2009/11/10 Amogh Vasekar > > Hi, >

Re: MapReduce Child don't exit?

2009-11-16 Thread Jason Venner
The dfs client code waits until the all of the datanodes that are going to hold a replica of your output's blocks have ack'd. If you are pausing there, most likely something is wrong in your hdfs cluster. On Thu, Nov 12, 2009 at 7:06 AM, Ted Xu wrote: > hi all, > > We are using hadoop-0.19.1 on

Re: Hadoop Streaming overhead

2009-11-16 Thread Jason Venner
All of your data has to be converted back and forth to strings, and passed through pipes from the jvm to your task and back from the task to the jvm. On Thu, Nov 12, 2009 at 10:06 PM, Alexey Tigarev wrote: > Hi All! > > How much overhead using Hadoop Streming vs. native Java steps does add? > >

Re: Why doesnt my mapper speak to me :(

2009-11-16 Thread Jason Venner
Your log messages to stdout,stderr and syslog will end up in the logs/userlogs directory of your task tracker. If the job is still visible via the web ui for the job tracker host (usually port 50030), you can select the individual tasks that were run for your job, and if you click through enough s

Re: noobie question on hadoop's NoClassDefFoundError

2009-11-16 Thread Jason Venner
Your eclipse instance doesn't have the jar files in the lib directory of your hadoop installation in the class path. On Sat, Nov 14, 2009 at 7:51 PM, felix gao wrote: > I wrote a simple code in my eclipse as > > Text t = new Text("hadoop"); > System.out.println((char)t.charAt(2)); > > when I tr

Re: Map output compression leads to JVM crash (0.20.0)

2009-10-27 Thread Jason Venner
The failure appear or occur in code in the system dynamic linker, which implies a shared library compatibility problem, or a heap shortfall On Mon, Oct 26, 2009 at 2:25 PM, Ed Mazur wrote: > Err, disregard that. > > $ cat /proc/version > Linux version 2.6.9-89.0.9.plus.c4smp (mockbu...@builder10

Re: MultipleTextOutputFormat giving "Bad connect ack with firstBadLink"

2009-10-27 Thread Jason Venner
This error is very common in applications that run out of file descriptors or simply open vast numbers of files on an and HDFS with a very high block density per datanode. It is quite easy to open hundreds of thousands of files with the Multi*OutputFormat classes. If you can collect your output in

Re: MapRed Job Completes; Output Ceases Mid-Job

2009-10-08 Thread Jason Venner
Are you perhaps creating large numbers of files, and running out of file descriptors in your tasks. On Wed, Oct 7, 2009 at 1:52 PM, Geoffry Roberts wrote: > All, > > I have a MapRed job that ceases to produce output about halfway through. > The obvious question is why? > > This job reads a file a

Re: What does MAX_FAILED_UNIQUE_FETCHES mean?

2009-07-28 Thread Jason Venner
I have seen this happen when there are inconsistent hostname to ip address lookups across the cluster and a node running a reducer is not connecting to the host that actually has the map output due to getting a different ip address for the node name. On Mon, Jul 27, 2009 at 9:46 AM, Geoffry Robert