Subject: Re: Nutch not indexing full collection
has this been solved?
If your http.content.limit has not been increased in nutch-site.xml then
you will not be able to store this data and index with Solr.
On Mon, Jul 25, 2011 at 6:18 PM, Chip Calhoun ccalh...@aip.org wrote:
I'm still
To: user@nutch.apache.org
Cc: Chip Calhoun
Subject: Re: Nutch not indexing full collection
Nutch truncates content longer than configured and Solr truncates content
exceeding max field length. Maybe check your limits.
I'm still having trouble with this. In addition to the nutch-site-xml
posted
Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Monday, August 01, 2011 3:45 PM
To: user@nutch.apache.org
Cc: Chip Calhoun
Subject: Re: Nutch not indexing full collection
Nutch truncates content longer than configured and Solr truncates content
exceeding max
And I'm still indexing with this command:
bin/nutch crawl urls -dir crawl -depth 15 -topN 50
-Original Message-
From: lewis john mcgibbney [mailto:lewis.mcgibb...@gmail.com]
Sent: Wednesday, July 27, 2011 12:18 PM
To: user@nutch.apache.org
Subject: Re: Nutch not indexing full collection
: Nutch not indexing full collection
Hi Chip,
I would try running your scripts after setting the environment variable
$NUTCH_HOME to nutch/runtime/local/NUTCH_HOME
On Wed, Jul 20, 2011 at 4:01 PM, Chip Calhoun ccalh...@aip.org wrote:
I've been working with $NUTCH_HOME/runtime/local/conf/nutch
, July 20, 2011 5:23 PM
To: user@nutch.apache.org
Subject: Re: Nutch not indexing full collection
Hi Chip,
I would try running your scripts after setting the environment variable
$NUTCH_HOME to nutch/runtime/local/NUTCH_HOME
On Wed, Jul 20, 2011 at 4:01 PM, Chip Calhoun ccalh...@aip.org wrote
Hi,
I'm using Nutch 1.3 to crawl a section of our website, and it doesn't seem to
crawl the entire thing. I'm probably missing something simple, so I hope
somebody can help me.
My urls/nutch file contains a single URL:
http://www.aip.org/history/ohilist/transcripts.html , which is an
I'd have suspected db.max.outlinks.per.page but you seem to have set it up
correctly. Are you running Nutch in runtime/local? in which case you
modified nutch-site.xml in runtime/local/conf, right?
nutch readdb -stats will give you the total number of pages known etc
Julien
On 20 July 2011
Subject: Re: Nutch not indexing full collection
I'd have suspected db.max.outlinks.per.page but you seem to have set it up
correctly. Are you running Nutch in runtime/local? in which case you modified
nutch-site.xml in runtime/local/conf, right?
nutch readdb -stats will give you the total number
, but I can no longer find any reference to
it.
Chip
-Original Message-
From: Julien Nioche [mailto:lists.digitalpeb...@gmail.com]
Sent: Wednesday, July 20, 2011 10:06 AM
To: user@nutch.apache.org
Subject: Re: Nutch not indexing full collection
I'd have suspected
10 matches
Mail list logo