Re: code changes not reflecting when deployed on hadoop

2012-12-27 Thread Sourajit Basak
Maybe on hadoop 1.1, any job submitted via ToolRunner is stored in the distributed cache. Will keep the thread updated. On Thu, Dec 27, 2012 at 8:24 PM, Sourajit Basak wrote: > This is what I did. > > Our nutch directory only contains the following structure. Basically the > script does what I wa

Re: code changes not reflecting when deployed on hadoop

2012-12-27 Thread Sourajit Basak
This is what I did. Our nutch directory only contains the following structure. Basically the script does what I was doing previously. apache-nutch-1.5.1.job +bin nutch Even in this case, I deleted the entire fetcher package. The fetch command worked !!! Is anyone in a position to repeat this

Re: code changes not reflecting when deployed on hadoop

2012-12-27 Thread Sourajit Basak
Are you saying that I put hadoop binary on the path and use the nutch script like on local. On Thu, Dec 27, 2012 at 7:35 PM, Sourajit Basak wrote: > Didn't understand. > Lets say I put the job file in HADOOP_HOME/bin. What commands do I fire ? > > > > On Thu, Dec 27, 2012 at 7:27 PM, Markus Jelsm

Re: code changes not reflecting when deployed on hadoop

2012-12-27 Thread Sourajit Basak
Didn't understand. Lets say I put the job file in HADOOP_HOME/bin. What commands do I fire ? On Thu, Dec 27, 2012 at 7:27 PM, Markus Jelsma wrote: > CWD

RE: code changes not reflecting when deployed on hadoop

2012-12-27 Thread Markus Jelsma
It works the same as in local mode, just have the job file in the CWD. -Original message- > From:Sourajit Basak > Sent: Thu 27-Dec-2012 14:51 > To: user@nutch.apache.org > Subject: Re: code changes not reflecting when deployed on hadoop > > We are using hadoop 1.1 > > On Thu, Dec 27,

Re: code changes not reflecting when deployed on hadoop

2012-12-27 Thread Sourajit Basak
We are using hadoop 1.1 On Thu, Dec 27, 2012 at 7:13 PM, Sourajit Basak wrote: > How do you use the nutch script on a cluster ? > > > On Thu, Dec 27, 2012 at 6:25 PM, Markus Jelsma > wrote: > >> Can you try using the nutch script to run your fetcher? > > >

Re: code changes not reflecting when deployed on hadoop

2012-12-27 Thread Sourajit Basak
How do you use the nutch script on a cluster ? On Thu, Dec 27, 2012 at 6:25 PM, Markus Jelsma wrote: > Can you try using the nutch script to run your fetcher?

RE: code changes not reflecting when deployed on hadoop

2012-12-27 Thread Markus Jelsma
Seems the job file is not deployed to all task trackers and i'm not sure why. Can you try using the nutch script to run your fetcher? -Original message- > From:Sourajit Basak > Sent: Thu 27-Dec-2012 13:29 > To: user@nutch.apache.org > Subject: code changes not reflecting when deployed

code changes not reflecting when deployed on hadoop

2012-12-27 Thread Sourajit Basak
We have made some changes to Fetcher (v1.5). However, when we build a .job (jar) and deploy it on hadoop it doesn't seem to pick up any changes. This is how we are running it. >> ./hadoop jar ../nutch/apache-nutch-1.5.1.job org.apache.nutch.fetcher.Fetcher -threads 4 However, if we modify any of

Re: Site being crawled even when the URL is removed from seed.txt

2012-12-27 Thread Rajani Maski
Hi Tejas, Right, this is because of back up files. Thank you very much for the support. On Thu, Dec 27, 2012 at 3:27 PM, Tejas Patil wrote: > This might be the reason: You are using GEdit to edit the seeds file. It > creates a backup of the old version of the file when changes are made to > it.

Re: Site being crawled even when the URL is removed from seed.txt

2012-12-27 Thread Tejas Patil
This might be the reason: You are using GEdit to edit the seeds file. It creates a backup of the old version of the file when changes are made to it. The backup file is hidden. Check the contents of the urls directory using this command: *ls -a urls* (to be executed from NUTCH_HOME. In your setup