Re: Getting jobTracker startTime from the JobClient

2008-05-01 Thread Steve Loughran
Pete Wyckoff wrote: I'm looking for the actual #of seconds since (or absolute time) the job tracker was ready to start accepting jobs. I'm writing a utility to (attempt :)) more robustly run hadoop jobs and one thing I want to detect is if the JobTracker has gone down while a job that failed was

Re: One-node cluster with DFS on Debian

2008-05-01 Thread Steve Loughran
Richard Crowley wrote: Problem fixed. My machine's /etc/hostname file came without a fully-qualified domain name. Why does Hadoop (or perhaps just java.net.InetAddress) rely on reverse DNS lookups? Richard Jave networking is a mess. There are some implicit assumptions "welll managed netw

Re: Inconsistency while running in eclipse and cygwin

2008-05-01 Thread Sridhar Raman
I managed to fix both the problems. 1) The one in Eclipse was happening because of the df task which wasn't possible in Windows. Once I added cygwin/bin to the Path, this started working. 2) This one occurred because the output value class of my Combiner and Reducer was different. On Wed, Apr

Re: User accounts in Master and Slaves

2008-05-01 Thread Sridhar Raman
Though I am able to run MapReduce tasks without errors, I am still not able to get stop-all to work. It still says, "no tasktracker to stop, no datanode to stop, ...". And also, there are a lot of java processes running in my Task Manager which I need to forcibly shut down. Are these two problem

Re: Input and Output types?

2008-05-01 Thread Sridhar Raman
Thanks. On Fri, Apr 18, 2008 at 9:14 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > > On Apr 17, 2008, at 11:20 PM, Sridhar Raman wrote: > > I am new to MapReduce and Hadoop, and I have managed to find my way > > through > > with a few programs. But I still have some doubts that are constantly

Re: Hadoop Cluster Administration Tools?

2008-05-01 Thread Steve Loughran
Khalil Honsali wrote: Thanks Mr. Steve, and everyone.. I actually have just 16 machines (normal P4 PCs), so in case I need to do things manually it takes half an hour (for example when installing sun-java, I had to type that 'yes' for each .bin install) but for now i'm ok with pssh or just a sim

ClassNotFoundException while running jar file

2008-05-01 Thread chaitanya krishna
Hi, I wanted to run my own java code in hadoop. The following are the commands that I executed and errors occurred. mkdir temp javac -Xlint -classpath hadoop-0.16.0-core.jar -d temp GetFeatures.java (GetFeatures.java is the code) jar -cvf temp.jar temp bin/hadoop jar

Re: Block reports: memory vs. file system, and Dividing offerService into 2 threads

2008-05-01 Thread Cagdas Gerede
As far as I understand, the current focus is on how to reduce namenode's CPU time to process block reports from a lot of datanodes. Don't we miss another issue? Doesn't the way a block report is computed delays the master startup time. I have to make sure the master is up as quick as possible for

RE: OOM error with large # of map tasks

2008-05-01 Thread Devaraj Das
Hi Lili, sorry that I missed one important detail in my last response - tasks that complete successfully on tasktrackers are marked as COMMIT_PENDING by the tasktracker itself. The JobTracker takes those COMMIT_PENDING tasks, promotes their output (if applicable), and then marks them as SUCCEEDED.

Re: Hadoop-On-Demand (HOD) newbie: question and curious about user experience

2008-05-01 Thread Alvin AuYoung
Hi Vinod, thanks a lot for the reply. The updated description answers a lot of my questions -- I apologize for not finding it earlier. On Wed, 30 Apr 2008, Vinod KV wrote: On a first note, can you please tell which documentation you are looking at. 'Coz hod interface is cleaned up and -o op

RE: OOM error with large # of map tasks

2008-05-01 Thread Devaraj Das
Long term we need to see how we can minimize the memory consumption by objects corresponding to completed tasks in the tasktracker. > -Original Message- > From: Devaraj Das [mailto:[EMAIL PROTECTED] > Sent: Friday, May 02, 2008 1:29 AM > To: 'core-user@hadoop.apache.org' > Subject: RE: O

Re: JobConf: How to pass List/Map

2008-05-01 Thread Jason Venner
We have been serializing to a bytearrayoutput stream then base64 encoding the underlying byte array and passing that string in the conf. It is ugly but it works well until 0.17 Enis Soztutar wrote: Yes Stringifier was committed in 0.17. What you can do in 0.16 is to simulate DefaultStringifier.

Re: OOM error with large # of map tasks

2008-05-01 Thread Jason Venner
We have a problem with this in our application, in particular sometimes threads started by the map/reduce class block the tasktracker$child process from exiting when the map/reduce is done. JMX is the number 1 cause of this for us, Badly behaving JNI tasks is #2, MINA is #3 We modify the task

Re: ClassNotFoundException while running jar file

2008-05-01 Thread Jason Venner
You need to add your class or a class in your jar to the constructor for your JobConf object. ? ? JobConf(Class exampleClass) ? ? Construct a map/reduce job configuration. ?

Re: OOM error with large # of map tasks

2008-05-01 Thread sam rash
Hi, In fact we verified it is our jobconf--we have about 800k in input paths (11k files for a few TB of data). We'll indeed up the heap size to about 2048m and we can also do some significant optimizations on the file paths (use wildcards and others). Is there any plan to make the storage of the J

Workaround for Hadoop 0.16.3 org.xml.sax.SAXParseException on Mac OS X?

2008-05-01 Thread Craig E. Ward
I successfully used Hadoop 0.15.3 on Mac OS X 10.4 with Java 5, but I get a strange error when trying to upgrade to 0.16.x: $ hadoop jar hadoop-*examples.jar grep input output 'dfs[a-z.]+' 08/05/01 15:22:21 INFO mapred.FileInputFormat: Total input paths to process : 11 org.apache.hadoop.ipc.Remo

Re: Hadoop Cluster Administration Tools?

2008-05-01 Thread Bradford Stephens
*Very* cool information. As someone who's leading the transition to open-source and cluster-orientation at a company of about 50 people, finding good tools for the IT staff to use is essential. Thanks so much for the continued feedback. On Thu, May 1, 2008 at 6:10 AM, Steve Loughran <[EMAIL PROTE

Re: Hadoop Cluster Administration Tools?

2008-05-01 Thread Allen Wittenauer
On 5/1/08 5:00 PM, "Bradford Stephens" <[EMAIL PROTECTED]> wrote: > *Very* cool information. As someone who's leading the transition to > open-source and cluster-orientation at a company of about 50 people, > finding good tools for the IT staff to use is essential. Thanks so much for > the continu

RE: How do I copy files from my linux file system to HDFS using a java prog?

2008-05-01 Thread Babu, Suresh
Try this program. Modify the HDFS configuration, if it is different from the default. import java.io.File; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FSDataIn