Hello,
thanks for your reply. Now I tried it and it is not working.
I just put the lines with + into the source code. The lines are as follows:
+public static final boolean CRAWL_IGNORE_EXTERNAL_LINKS =
+NutchConf.get().getBoolean(crawl.ignore.external.links, false);
and
+
You should use the patch-utility to integrate the patch, not be doing it
by hand.
That line you mention is sort of meta-data and interpreted by the
patch-utility. It's nothing you need to add to the sourcefiles!
Good luck,
Stefan
Ronny wrote:
Hello,
thanks for your reply. Now I tried it
Hi Stefan,
which utility I need and after installing how do I install the patch?
Sorry for this questions but I am a beginner in Java and nutch...
Thanks for your help
Ronny
- Original Message -
From: Stefan Neufeind [EMAIL PROTECTED]
To: nutch-user@lucene.apache.org
Sent: Tuesday,
You'd use the patch-utility, which is generally available on every
Linux-installation I know. It's nothing Java-specific or so. Also
various development-IDEs feature patch-/merge-functionality as well.
Regards,
Stefan
Ronny wrote:
Hi Stefan,
which utility I need and after installing how
Hi Stefan,
now it works. Thanks for your help.
Kind regards
Ronny
- Original Message -
From: Stefan Neufeind [EMAIL PROTECTED]
To: nutch-user@lucene.apache.org
Sent: Tuesday, July 25, 2006 2:44 PM
Subject: Re: Please Help - Patch install
You'd use the patch-utility, which is
Is there any way to find out what web pages on a specific domain have
been crawled by Nutch ?
In other words is there any way to get the list of urls that were
downloaded and processed by Nutch ?
Hi all,
after installing the patch http://issues.apache.org/jira/browse/NUTCH-173 and a
whole-web crawl external links will still be crawled.
I modified the nutch-site.xml as follows:
property
namecrawl.ignore.external.links/name
valuetrue/value
descriptionnot crwling external
I am certainly far from a nutch expert, but it appears to me that there are
two errors in the current Nutch 0.8 tutorial.
First off, here is the version of Nutch 0.8 that I am using, in case there
has been changes made in newer version that invalidate my comments:
-bash-2.05b$ svn info
Path: .
Thank you Howie. Sounds like we need to normalize the data to be indexed
in Lucene as well as the input.
-Original Message-
From: Howie Wang [mailto:[EMAIL PROTECTED]
Sent: Monday, July 24, 2006 6:53 PM
To: nutch-user@lucene.apache.org
Subject: RE: Lucene question
I'm not sure if
Ronny wrote:
Hi all,
after installing the patch http://issues.apache.org/jira/browse/NUTCH-173 and
a whole-web crawl external links will still be crawled.
I modified the nutch-site.xml as follows:
property
namecrawl.ignore.external.links/name
valuetrue/value
descriptionnot
Thanks Renaud. I will post my question in the right forum.
-Original Message-
From: Renaud Richardet [mailto:[EMAIL PROTECTED]
Sent: Monday, July 24, 2006 5:18 PM
To: nutch-user@lucene.apache.org
Subject: Re: Lucene question
Hello Rajan,
Please have a look at
Hi Stefan,
I didn´t do that What I have to do now? Rebuild with ant? Please can you
tell me how to do that?
As I am said before I am very new to nutch and Java.
Kind regards and many thanks
Ronny
- Original Message -
From: Stefan Neufeind [EMAIL PROTECTED]
To:
If you download the latest trunk copy of 0.8, bin/nutch will not even be
available.. is this supposed to be this way?
Matt
Bryan Woliner wrote:
I am certainly far from a nutch expert, but it appears to me that
there are
two errors in the current Nutch 0.8 tutorial.
First off, here is the
n/m it's there now..
Matt
Matthew Holt wrote:
If you download the latest trunk copy of 0.8, bin/nutch will not even
be available.. is this supposed to be this way?
Matt
Bryan Woliner wrote:
I am certainly far from a nutch expert, but it appears to me that
there are
two errors in the current
Logging of the Fetcher output in 0.8-dev used to work (writing to the
corresponding tasktracker output log) but doesn't appear to any more with
the nightly build from a couple of weeks ago and also the one from last
night.
I've enabled DEBUG for the first 4 logging properties in
Hi Doug,
is it possible you could post your hadoop-site.xml? I would like to
accomplish the same.
Rgrds. Thomas
On 7/21/06, Doug Cook [EMAIL PROTECTED] wrote:
Thanks, Håvard (and Doug, in the original email).
Those pointers, plus a few other tips from elsewhere, did the trick. I'm now
up
Eric,
you should setup the searcher.dir property in nutch-site.xml to point
to the crawl directory,. See nutch-default.xml for an explanation of
this config property.
Rgrds, Thomas
On 7/22/06, Eric Wu [EMAIL PROTECTED] wrote:
Hi,
I am new to Nutch and I got a null pointer exception whenI try
Matt,
it's the index that is used for searching, not the webdb.
What is the status of these pages in webdb? Likely they are not
fetched yet (DB_UNFETCHED), and thus can never be in your index.
These articles give very nice basic explanation of different concepts:
There's 'nutch readdb' command -
[EMAIL PROTECTED]:~ nutch readdb
Usage: CrawlDbReader crawldb (-stats | -dump out_dir | -topN
out_dir [min] | -url url)
crawldb directory name where crawldb is located
-stats print overall statistics to System.out
-dump out_dir
For stuff like this best use whole web concepts as explained in the tutorial.
Rgrds, Thomas
On 7/25/06, Robert Sanford [EMAIL PROTECTED] wrote:
I'm running version 0.7.2 and I'm using the Intranet crawl where I
specify a list of site root URIs in a text file along with a list of
regex for
You wanna say that only in windows this error occurs? I haven't tested in
linux yet. Has anyone a solution for this problem in windows/tomcat?
On 7/25/06, Thomas Delnoij [EMAIL PROTECTED] wrote:
Lourival.
I have typically seen the same issues on a cygwin/windows setup. The
only thing that
-Original Message-
From: Thomas Delnoij [mailto:[EMAIL PROTECTED]
Sent: Tuesday, July 25, 2006 2:53 PM
To: nutch-user@lucene.apache.org
Subject: Re: Injecting Into Intranet Crawl
For stuff like this best use whole web concepts as explained
in the tutorial.
Rgrds, Thomas
The
I understand that a patch exists, but can not figure out how to get it or
then install. Can someone help me out … do patches require compiling from
source?
--
View this message in context:
http://www.nabble.com/Nutch-0.8--Spell-Check-tf2000928.html#a5494331
Sent from the Nutch - User forum at
23 matches
Mail list logo