Client hanging 20 seconds after job's over (WAS: Re: Can I run HBase 0.20.6 on Hadoop 0.21?)

2010-09-27 Thread Jean-Daniel Cryans
(adding mapreduce-user@ and re-scoping title) Can you jstack the client while it's waiting 20 seconds? Is it still waiting for the job to come back or it's something else? Is the job itself done cleaning 20 seconds before the call returns on the client side (check the web ui)? J-D On Mon, Sep 27

Limiting the number of data records processed per reduce process

2010-09-27 Thread George P. Stathis
Possible beginner's question here but I can't find an obvious answer in the docs. Is there a way to configure a job such that it imposes a cap on the number of records each reduce process receives at a time, regardless of how the data was partitioned or how many reducers were configured for the job

Re: JobClient using deprecated JobConf

2010-09-27 Thread David Rosenstrauch
On 09/25/2010 10:24 AM, Martin Becker wrote: Hello David, thanks a lot. Yet I want java code to submit my application. I do not want to mess with any kind of command line arguments or an executable, neither Java nor Hadoop. I want to write a method that can set up and submit a job to an arbitrar

FixedLengthInputFormat: Patch 1176

2010-09-27 Thread Ratner, Alan S (IS)
I am trying to read in fixed-width records using the code from Patch 1176 (FixedLengthInputFormat and FixedLengthRecordReader) to use in Hadoop 0.21.0. (Rather than rebuild Hadoop I simply copied the code into my Eclipse package.) Eclipse recognizes FixedLengthInputFormat but finds it incompatib

starting hadoop fails

2010-09-27 Thread Johannes.Lichtenberger
Hi, I'm trying to run the Cloudera hadoop distribution, but it seems it always fails. The log of DataNode: 2010-09-27 15:49:07,081 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTU

Re: setXIncludeAware(true) exception

2010-09-27 Thread Johannes.Lichtenberger
Sorry, my fault, I've had to include it in my setup of the TestCase. But nontheless some output, when starting: 10/09/27 14:47:12 ERROR mapred.MiniMRCluster: Job tracker crashed java.lang.NullPointerException at java.io.File.(File.java:222) at org.apache.hadoop.mapred.JobHistory.in

setXIncludeAware(true) exception

2010-09-27 Thread Johannes.Lichtenberger
I'm getting an exception when running Hadoop: 10/09/27 14:31:34 ERROR conf.Configuration: Failed to set setXIncludeAware(true) for parser org.apache.xerces.jaxp.documentbuilderfactoryi...@406754d6:java.lang.UnsupportedOperationException: This parser does not support specification "null" version "n

Re: what does job tracker status reduce > copy mean?

2010-09-27 Thread Vitaliy Semochkin
Thank you very much, in my case reduce > copy (998 of 1777 at 4.55 MB/s) freezed at number 998 and did nothing till it was aboarted as freezed task. Does 4.55MB/s mean that data was transfered between cluster with such small speed(i.e. a network problem)? Once task fall with out of memory excepti

Re: Remote connection bottleneck?

2010-09-27 Thread Martin Kuhn
Hi Mario, In the ssh I can't execute local files while my session is open... Of course, you can refer to local files in the hadoop command, but if you're in the ssh window, the files on your PC are remote ;-) It should work fine if you put your jar via (win)scp on the remote computer. With an