Re: Please Help - Patch install

2006-07-25 Thread Ronny
Hello, thanks for your reply. Now I tried it and it is not working. I just put the lines with + into the source code. The lines are as follows: +public static final boolean CRAWL_IGNORE_EXTERNAL_LINKS = +NutchConf.get().getBoolean(crawl.ignore.external.links, false); and +

Re: Please Help - Patch install

2006-07-25 Thread Stefan Neufeind
You should use the patch-utility to integrate the patch, not be doing it by hand. That line you mention is sort of meta-data and interpreted by the patch-utility. It's nothing you need to add to the sourcefiles! Good luck, Stefan Ronny wrote: Hello, thanks for your reply. Now I tried it

Re: Please Help - Patch install

2006-07-25 Thread Ronny
Hi Stefan, which utility I need and after installing how do I install the patch? Sorry for this questions but I am a beginner in Java and nutch... Thanks for your help Ronny - Original Message - From: Stefan Neufeind [EMAIL PROTECTED] To: nutch-user@lucene.apache.org Sent: Tuesday,

Re: Please Help - Patch install

2006-07-25 Thread Stefan Neufeind
You'd use the patch-utility, which is generally available on every Linux-installation I know. It's nothing Java-specific or so. Also various development-IDEs feature patch-/merge-functionality as well. Regards, Stefan Ronny wrote: Hi Stefan, which utility I need and after installing how

Re: Please Help - Patch install

2006-07-25 Thread Ronny
Hi Stefan, now it works. Thanks for your help. Kind regards Ronny - Original Message - From: Stefan Neufeind [EMAIL PROTECTED] To: nutch-user@lucene.apache.org Sent: Tuesday, July 25, 2006 2:44 PM Subject: Re: Please Help - Patch install You'd use the patch-utility, which is

Links

2006-07-25 Thread termopro
Is there any way to find out what web pages on a specific domain have been crawled by Nutch ? In other words is there any way to get the list of urls that were downloaded and processed by Nutch ?

Please Help - Patch not working - external links still crawled

2006-07-25 Thread Ronny
Hi all, after installing the patch http://issues.apache.org/jira/browse/NUTCH-173 and a whole-web crawl external links will still be crawled. I modified the nutch-site.xml as follows: property namecrawl.ignore.external.links/name valuetrue/value descriptionnot crwling external

Two Errors in Nutch 0.8 Tutorial?

2006-07-25 Thread Bryan Woliner
I am certainly far from a nutch expert, but it appears to me that there are two errors in the current Nutch 0.8 tutorial. First off, here is the version of Nutch 0.8 that I am using, in case there has been changes made in newer version that invalidate my comments: -bash-2.05b$ svn info Path: .

RE: Lucene question

2006-07-25 Thread Rajan, Renuka
Thank you Howie. Sounds like we need to normalize the data to be indexed in Lucene as well as the input. -Original Message- From: Howie Wang [mailto:[EMAIL PROTECTED] Sent: Monday, July 24, 2006 6:53 PM To: nutch-user@lucene.apache.org Subject: RE: Lucene question I'm not sure if

Re: Please Help - Patch not working - external links still crawled

2006-07-25 Thread Stefan Neufeind
Ronny wrote: Hi all, after installing the patch http://issues.apache.org/jira/browse/NUTCH-173 and a whole-web crawl external links will still be crawled. I modified the nutch-site.xml as follows: property namecrawl.ignore.external.links/name valuetrue/value descriptionnot

RE: Lucene question

2006-07-25 Thread Rajan, Renuka
Thanks Renaud. I will post my question in the right forum. -Original Message- From: Renaud Richardet [mailto:[EMAIL PROTECTED] Sent: Monday, July 24, 2006 5:18 PM To: nutch-user@lucene.apache.org Subject: Re: Lucene question Hello Rajan, Please have a look at

Re: Please Help - Patch not working - external links still crawled

2006-07-25 Thread Ronny
Hi Stefan, I didn´t do that What I have to do now? Rebuild with ant? Please can you tell me how to do that? As I am said before I am very new to nutch and Java. Kind regards and many thanks Ronny - Original Message - From: Stefan Neufeind [EMAIL PROTECTED] To:

Re: Two Errors in Nutch 0.8 Tutorial?

2006-07-25 Thread Matthew Holt
If you download the latest trunk copy of 0.8, bin/nutch will not even be available.. is this supposed to be this way? Matt Bryan Woliner wrote: I am certainly far from a nutch expert, but it appears to me that there are two errors in the current Nutch 0.8 tutorial. First off, here is the

Re: Two Errors in Nutch 0.8 Tutorial?

2006-07-25 Thread Matthew Holt
n/m it's there now.. Matt Matthew Holt wrote: If you download the latest trunk copy of 0.8, bin/nutch will not even be available.. is this supposed to be this way? Matt Bryan Woliner wrote: I am certainly far from a nutch expert, but it appears to me that there are two errors in the current

Problem with logging of Fetcher output in 0.8-dev

2006-07-25 Thread e w
Logging of the Fetcher output in 0.8-dev used to work (writing to the corresponding tasktracker output log) but doesn't appear to any more with the nightly build from a couple of weeks ago and also the one from last night. I've enabled DEBUG for the first 4 logging properties in

Re: Best performance approach for single MP machine?

2006-07-25 Thread Thomas Delnoij
Hi Doug, is it possible you could post your hadoop-site.xml? I would like to accomplish the same. Rgrds. Thomas On 7/21/06, Doug Cook [EMAIL PROTECTED] wrote: Thanks, Håvard (and Doug, in the original email). Those pointers, plus a few other tips from elsewhere, did the trick. I'm now up

Re: Null pointer error when perform search

2006-07-25 Thread Thomas Delnoij
Eric, you should setup the searcher.dir property in nutch-site.xml to point to the crawl directory,. See nutch-default.xml for an explanation of this config property. Rgrds, Thomas On 7/22/06, Eric Wu [EMAIL PROTECTED] wrote: Hi, I am new to Nutch and I got a null pointer exception whenI try

Re: Why would a record be in the database but not show up in the results?

2006-07-25 Thread Thomas Delnoij
Matt, it's the index that is used for searching, not the webdb. What is the status of these pages in webdb? Likely they are not fetched yet (DB_UNFETCHED), and thus can never be in your index. These articles give very nice basic explanation of different concepts:

Re: Links

2006-07-25 Thread Thomas Delnoij
There's 'nutch readdb' command - [EMAIL PROTECTED]:~ nutch readdb Usage: CrawlDbReader crawldb (-stats | -dump out_dir | -topN out_dir [min] | -url url) crawldb directory name where crawldb is located -stats print overall statistics to System.out -dump out_dir

Re: Injecting Into Intranet Crawl

2006-07-25 Thread Thomas Delnoij
For stuff like this best use whole web concepts as explained in the tutorial. Rgrds, Thomas On 7/25/06, Robert Sanford [EMAIL PROTECTED] wrote: I'm running version 0.7.2 and I'm using the Intranet crawl where I specify a list of site root URIs in a text file along with a list of regex for

Re: Recrawl script for 0.8.0 completed...

2006-07-25 Thread Lourival Júnior
You wanna say that only in windows this error occurs? I haven't tested in linux yet. Has anyone a solution for this problem in windows/tomcat? On 7/25/06, Thomas Delnoij [EMAIL PROTECTED] wrote: Lourival. I have typically seen the same issues on a cygwin/windows setup. The only thing that

RE: Injecting Into Intranet Crawl

2006-07-25 Thread Robert Sanford
-Original Message- From: Thomas Delnoij [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 25, 2006 2:53 PM To: nutch-user@lucene.apache.org Subject: Re: Injecting Into Intranet Crawl For stuff like this best use whole web concepts as explained in the tutorial. Rgrds, Thomas The

Nutch 0.8 – Spell Check

2006-07-25 Thread BDalton
I understand that a patch exists, but can not figure out how to get it or then install. Can someone help me out … do patches require compiling from source? -- View this message in context: http://www.nabble.com/Nutch-0.8--Spell-Check-tf2000928.html#a5494331 Sent from the Nutch - User forum at