map-side join - Join.java - Where exactly joining happens

2010-05-12 Thread Dhadoop
Hi, I am curious to know where which file exactly joining process happens. I have been trying and not able to trace it. Thanks, Dhana -- View this message in context: http://old.nabble.com/map-side-join---Join.java---Where-exactly-joining-happens-tp28532626p28532626.html Sent from the Hadoop

where can we get the details of the history jobs~

2010-05-12 Thread Eason.Lee
We can see the details(the counters) of the resent jobs from the web but I can't found the details(the counters) of the history jobs is there any way to see them?

Directory /tmp/hadoop-root/dfs/name is in an inconsistent state: storage directory DOES NOT exist or is NOT accesible

2010-05-12 Thread Michael Robinson
Please help!!! I just downloaded and installed Hadoop-0.20.2 in Ubuntu following the instructions in http://hadoop.apache.org/common/docs/r0.20.2/quickstart.html. I did NOT get any errors during the installation, however when I try to run the example programs I get the following error in t

Directory /tmp/hadoop-root/dfs/name is in an inconsistent state: storage directory DOES NOT exist or is NOT accesible

2010-05-12 Thread Michael Robinson
Please help!!! I just downloaded and installed Hadoop-0.20.2 in Ubuntu following the instructions in http://hadoop.apache.org/common/docs/r0.20.2/quickstart.html. I did NOT get any errors during the installation, however when I try to run the example programs I get the following error in t

Reducing the retry connections from task tacker to job tracker

2010-05-12 Thread Balanagireddy Mudiam
Hi, I am launching hadoop cluster on amazon ec2. For a period of 30-40 minutes, hdfs is in safemode and recovering the blocks. Till then job tracker doesn't to accept connections from task tracker. I can see from logs that task trackers are unsuccessfully trying to establish connection with job tr

Backup of job history

2010-05-12 Thread Balanagireddy Mudiam
Hi, We are launching hadoop cluster on amazon ec2. After successful/unsuccessful completion of the job, we terminate the cluster and we loose the data of the job statistics. I want to take backup of these statistics. Is there a script which backups these statistics and a utility to visualize these

submit multiple jobs simultaneously

2010-05-12 Thread Gang Luo
Hi, when we call JobClient.runJob(jobConf) to submit a job, will the program block until that job finishes? How to submit multiple job simultaneously? Multithreading? thanks, -Gang

RE: submit multiple jobs simultaneously

2010-05-12 Thread Oded Rotem
Run JobClient.submitJob instead of runJob. -Original Message- From: Gang Luo [mailto:lgpub...@yahoo.com.cn] Sent: Wednesday, May 12, 2010 6:00 PM To: common-user@hadoop.apache.org Subject: submit multiple jobs simultaneously Hi, when we call JobClient.runJob(jobConf) to submit a job, wil

Re: Can a Partitioner access the Reporter?

2010-05-12 Thread Eric Sammer
I don't believe there's any way to get a reference to a Reporter in the Partitioner. Using JobConf to pass a complex object like the Reporter isn't a good idea (and may not even work) because the reporter can't be serialized and JobConf / Configuration do not take arbitrary objects in their get() /

question about hadoop

2010-05-12 Thread dechao bu
hello, I want to deploy Hadoop on a cluster. In this cluster, different nodes share same file system. If I make changes to files on node1. then other nodes will have the same changes. (The file system of this cluster is perhaps called NFS ). I don't know whether this cluster is fit for de

Re: submit multiple jobs simultaneously

2010-05-12 Thread Gang Luo
Thanks Oded, it works. But there is not output showing the consumed time, counters, etc. How can I get those information? thanks, -Gang - 原始邮件 发件人: Oded Rotem 收件人: common-user@hadoop.apache.org 发送日期: 2010/5/12 (周三) 11:06:55 上午 主 题: RE: submit multiple jobs simultaneously Run JobC

Re: submit multiple jobs simultaneously

2010-05-12 Thread Jeff Zhang
The returned object of JobClient.submitJob(JobConf job) is RunningJob. You can get the job status through this object. 2010/5/12 Gang Luo : > Thanks Oded, it works. But there is not output showing the consumed time, > counters, etc. How can I get those information? > > thanks, > -Gang > > > > >

Re: question about hadoop

2010-05-12 Thread abhishek sharma
If the HDFS uses the NFS to store files, then all I/O during the execution of map and reduce tasks will use the NFS instead of the local disks on each machine in the cluster (if they have one). This can become a bottleneck if you have lots of tasks running simultaneously. However, even with the NF

Re: Directory /tmp/hadoop-root/dfs/name is in an inconsistent state: storage directory DOES NOT exist or is NOT accesible

2010-05-12 Thread Eric Sammer
Normally this is due to the machine having been rebooted and /tmp being cleared out. You do not want to leave the Hadoop name node or data node storage in /tmp for this reason. Make sure you properly configure dfs.name.dir and dfs.data.dir to point to directories outside of /tmp and other directori

Exception while running the sample

2010-05-12 Thread ankit bhatnagar
I recently did the setup the cluster hadoop setup with 10 nodes When I run the example to test the functionality it throws the exceptions - xxx.xxx.xxx.xxx: is the corect ip corect which I a able to ping Error initializing attempt_201005121158_0001_m_01_0: java.lang.IllegalArgumentExcepti

Re: Can a Partitioner access the Reporter?

2010-05-12 Thread Owen O'Malley
On May 11, 2010, at 11:06 PM, gmar wrote: I'd like to be able to have my customised Partitioner update counters in the Reporter. i.e. So that I know how many keys have been sent to each partition. So, is it possible for the partitioner to obtain a reference to the reporter? No, even in t

Re: question about hadoop

2010-05-12 Thread Edson Ramiro
Dechao bu, Pay attention that running Hadoop on NFS you can get problems with locks. And if you're looking for process large files, your network will probably be a bottleneck. -- Edson Ramiro Lucas Filho http://www.inf.ufpr.br/erlf07/ On 12 May 2010 12:47, abhishek sharma wrote: > If the HD

Re: question about hadoop

2010-05-12 Thread abhishek sharma
Edson, > Pay attention that running Hadoop on NFS you can get problems with locks. What locks are you referring to? I ran Hadoop on NFS and never ran into any problems. I had a small cluster with 10 servers all connected to the same switch. > > And if you're looking for process large files, you

Re: question about hadoop

2010-05-12 Thread Edson Ramiro
Abhishek, I'm running Hadoop here and the cluster admin had mounted the NFS with nolocks option. So, I was getting "No locks available" message. I said it just to pay attention in these kind of config ; ) -- Edson Ramiro Lucas Filho http://www.inf.ufpr.br/erlf07/ On 12 May 2010 14:42, abhish

Setting up a second cluster and getting a weird issue

2010-05-12 Thread Andrew Nguyen
I'm working on bringing up a second test cluster and am getting these intermittent errors on the DataNodes: 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory) at java

Re: Context needed by mapper

2010-05-12 Thread DNMILNE
OK, that was a dumb question, sorry. If I had worked to the end of the tutorial instead of immediately trying to solve my problem I would have found out the DistributedCache. DNMILNE wrote: > > Hi, > > I am very new to the MapReduce paradigm so this could be a dumb question. > > What do you

Re: Setting up a second cluster and getting a weird issue

2010-05-12 Thread Jeff Zhang
These 4 nodes share NFS ? On Thu, May 13, 2010 at 8:19 AM, Andrew Nguyen wrote: > I'm working on bringing up a second test cluster and am getting these > intermittent errors on the DataNodes: > > 2010-05-12 17:17:15,094 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > java.io.FileNo