Why is getTracker() method in JobTracker class no longer in 0.15.1 release?

2008-01-02 Thread Taeho Kang
Dear Hadoop Users and Developers, It looks like getTracker() method in JobTracker class (to get a hold of a running JobTracker instance) no longer exists for 0.15.1 release. The reason I want an instance of JobTracker is to get some information about the current and old job status. Is there any o

Re: Datanode Problem

2008-01-02 Thread Ted Dunning
/etc/hosts may be buggered as well. What is the entry for localhost? On 1/2/08 3:48 PM, "Billy Pearson" <[EMAIL PROTECTED]> wrote: > > >> localhost: ssh: localhost: Name or service not known > > that error looks like ssh is not running > > make sure its running and working > try to shh to

Re: mapred.tasktracker.map.tasks.maximum

2008-01-02 Thread Billy
I thank the best option would be able to set the max per node in its config file I thank someone is or has worked on this I seen something in jira. for the new option I would thank a job over ride would work something like this 1) Check node config if job over ride is lower then node then use j

Re: mapred.tasktracker.map.tasks.maximum

2008-01-02 Thread Arun C Murthy
On Thu, Jan 03, 2008 at 10:12:04AM +0530, Arun C Murthy wrote: >On Wed, Jan 02, 2008 at 12:08:53PM -0800, Jason Venner wrote: >>In our case, we have specific jobs that due to resource constraints can >>only be run serially (ie: 1 instance per machine). > >I see, at this point there isn't anything

Re: mapred.tasktracker.map.tasks.maximum

2008-01-02 Thread Arun C Murthy
On Wed, Jan 02, 2008 at 12:08:53PM -0800, Jason Venner wrote: >In our case, we have specific jobs that due to resource constraints can >only be run serially (ie: 1 instance per machine). I see, at this point there isn't anything in Hadoop which can help you out here... Having said that, could y

Re: Datanode Problem

2008-01-02 Thread Billy Pearson
localhost: ssh: localhost: Name or service not known that error looks like ssh is not running make sure its running and working try to shh to localhost from the server ssh localhost and see if it works. Billy - Original Message - From: "Natarajan, Senthil" <[EMAIL PROTECTED]>

Re: Nutch crawl problem

2008-01-02 Thread jibjoice
i crawl "http://lucene.apache.org"; and in conf/crawl-urlfilter.txt i set that "+^http://([a-z0-9]*\.)*apache.org/" when i use command "bin/nutch crawl urls -dir crawled -depth 3" have error that - crawl started in: crawled - rootUrlDir = urls - threads = 10 - depth = 3 - Injector: starting

Re: mapred.tasktracker.map.tasks.maximum

2008-01-02 Thread Billy
Some of the task I have will over run the servers if I ran say 2 of them per node but I have other task I can run 4 on a server so I was looking to get it config on the command line to better spread the work the way we want to. Billy "Arun C Murthy" <[EMAIL PROTECTED]> wrote in message news

Re: Datanode Problem

2008-01-02 Thread charles du
If you ran hadoop process under account 'hadoop', and set the hadoop data directory to a particular directory, you need make sure that your hadoop account can write to that directory. On Jan 2, 2008 2:06 PM, Natarajan, Senthil <[EMAIL PROTECTED]> wrote: > I Just uncommented and changed the JAVA_H

RE: Datanode Problem

2008-01-02 Thread Natarajan, Senthil
I Just uncommented and changed the JAVA_HOME, that's all I did in hadoop-env.sh. Do I need to configure anything else. Here is the hadoop-env.sh # Set Hadoop-specific environment variables here. # The only required environment variable is JAVA_HOME. All others are # optional. When running a d

Re: Datanode Problem

2008-01-02 Thread Ted Dunning
Well, you have something very strange going on in your scripts. Have you looked at hadoop-env.sh? On 1/2/08 1:58 PM, "Natarajan, Senthil" <[EMAIL PROTECTED]> wrote: >> /bin/bash: /root/.bashrc: Permission denied >> localhost: ssh: localhost: Name or service not known >> /bin/bash: /root/.bashr

RE: Datanode Problem

2008-01-02 Thread Natarajan, Senthil
No, I am running the processes as user "hadoop" I created a separated user for running hadoop deamons. -Original Message- From: Ted Dunning [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 02, 2008 4:55 PM To: hadoop-user@lucene.apache.org Subject: Re: Datanode Problem I don't know wh

Re: Datanode Problem

2008-01-02 Thread Ted Dunning
I don't know what your problem is, but I note that you appear to be running processes as root. This is a REALLY bad idea. It may also be related to your problem. On 1/2/08 1:33 PM, "Natarajan, Senthil" <[EMAIL PROTECTED]> wrote: > Hi, > I am new to Hadoop. I just downloaded release 0.14.4 (ha

RE: Is there an rsyncd for HDFS

2008-01-02 Thread Greg Connor
> From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED] > > hdfs doesn't allow random overwrites or appends. so even if > hdfs were mountable - i am guessing we couldn't just do a > rsync to a dfs mount (never looked at rsync code - but > assuming it does appends/random-writes). any emulation of > rsyn

Datanode Problem

2008-01-02 Thread Natarajan, Senthil
Hi, I am new to Hadoop. I just downloaded release 0.14.4 (hadoop-0.14.4.tar.gz) and trying to setup Hadoop on Single Machine (RedHat Linux 9) by following the link http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29 Looks like datanode is not starting seems,

Re: mapred.tasktracker.map.tasks.maximum

2008-01-02 Thread Jason Venner
In our case, we have specific jobs that due to resource constraints can only be run serially (ie: 1 instance per machine). Most of our jobs are more normal and can be run in parallel on the machines. Arun C Murthy wrote: Billy, On Wed, Jan 02, 2008 at 01:38:06PM -0600, Billy wrote: If I a

Re: mapred.tasktracker.map.tasks.maximum

2008-01-02 Thread Jason Venner
I believe you get this ability about 0.16.0. as of 0.15.1 this is a per cluster set at start time value. Billy wrote: If I add this to a command line as a -jobconf should it be enforced? Say I have a job that I want to run only 1 map at a time per server I have tried this and look in the job.x

Re: mapred.tasktracker.map.tasks.maximum

2008-01-02 Thread Arun C Murthy
Billy, On Wed, Jan 02, 2008 at 01:38:06PM -0600, Billy wrote: >If I add this to a command line as a -jobconf should it be enforced? > This is a property of the TaskTracker and hence cannot be set on a per-job basis... >Say I have a job that I want to run only 1 map at a time per server > Coul

RE: Is there an rsyncd for HDFS

2008-01-02 Thread Joydeep Sen Sarma
hdfs doesn't allow random overwrites or appends. so even if hdfs were mountable - i am guessing we couldn't just do a rsync to a dfs mount (never looked at rsync code - but assuming it does appends/random-writes). any emulation of rsync would end up having to delete and recreate changed files in

mapred.tasktracker.map.tasks.maximum

2008-01-02 Thread Billy
If I add this to a command line as a -jobconf should it be enforced? Say I have a job that I want to run only 1 map at a time per server I have tried this and look in the job.xml file and its set correctly but not enforced. Billy

Re: Is there an rsyncd for HDFS

2008-01-02 Thread Ted Dunning
That is a good idea. I currently use a shell script that does the rough equivalent of rsync -av, but it wouldn't be bad to have a one-liner that solves the same problem. One (slight) benefit to the scripted approach is that I get a list of directories to which files have been moved. That lets m

RE: HBase implementation question

2008-01-02 Thread Jim Kellerman
> -Original Message- > From: Stefan Groschupf [mailto:[EMAIL PROTECTED] > Sent: Wednesday, January 02, 2008 3:46 AM > To: hadoop-user@lucene.apache.org > Subject: Re: HBase implementation question > > Hi, > > Reads are probably a bit more complicated than writes. A read > > operation first

Is there an rsyncd for HDFS

2008-01-02 Thread Greg Connor
Hello, Does anyone know of a modified "rsync" that gets/puts files to/from the dfs instead of the normal, mounted filesystems? I'm guessing since the dfs can't be mounted like a "normal" filesystem that rsync would need to be modified in order to access it, as with any other program. We use r

Re: Not able to start Data Node

2008-01-02 Thread Dhaya007
Arun C Murthy wrote: > > What version of Hadoop are you running? > Dhaya007:hadoop-0.15.1 > > http://wiki.apache.org/lucene-hadoop/Help > > Dhaya007 wrote: > > ..datanode-slave.log >> 2007-12-19 19:30:55,579 WARN org.apache.hadoop.dfs.DataNode: Invalid >> directory in dfs.data.dir: directory

Re: Not able to start Data Node

2008-01-02 Thread Arun C Murthy
What version of Hadoop are you running? http://wiki.apache.org/lucene-hadoop/Help Dhaya007 wrote: > ..datanode-slave.log 2007-12-19 19:30:55,579 WARN org.apache.hadoop.dfs.DataNode: Invalid directory in dfs.data.dir: directory is not writable: /tmp/hadoop-hdpusr/dfs/data 2007-12-19 19:30:55,579

Re: HBase implementation question

2008-01-02 Thread Stefan Groschupf
Hi, Reads are probably a bit more complicated than writes. A read operation first checks the cache and may satisfy the request directly from the cache. If not, the operation checks the newest MapFile for the data, then the next to newest, ..., to the oldest stopping when the requested data has be

Re: Not able to start Data Node

2008-01-02 Thread Dhaya007
Thanks for your reply i am using password less ssh master to slave and following are the logs (slave) ..datanode-slave.log 2007-12-19 19:30:55,237 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP

Re: Nutch crawl problem

2008-01-02 Thread jibjoice
i crawl "http://lucene.apache.org"; and in conf/crawl-urlfilter.txt i set that "+^http://([a-z0-9]*\.)*apache.org/" when i use command "bin/nutch crawl urls -dir crawled -depth 3" have error that - crawl started in: crawled - rootUrlDir = urls - threads = 10 - depth = 3 - Injector: starting - Inj