Re: Nutch not indexing full collection

2011-08-01 Thread Markus Jelsma
Subject: Re: Nutch not indexing full collection has this been solved? If your http.content.limit has not been increased in nutch-site.xml then you will not be able to store this data and index with Solr. On Mon, Jul 25, 2011 at 6:18 PM, Chip Calhoun ccalh...@aip.org wrote: I'm still

RE: Nutch not indexing full collection

2011-08-01 Thread Chip Calhoun
To: user@nutch.apache.org Cc: Chip Calhoun Subject: Re: Nutch not indexing full collection Nutch truncates content longer than configured and Solr truncates content exceeding max field length. Maybe check your limits. I'm still having trouble with this. In addition to the nutch-site-xml posted

Re: Nutch not indexing full collection

2011-08-01 Thread Markus Jelsma
Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Monday, August 01, 2011 3:45 PM To: user@nutch.apache.org Cc: Chip Calhoun Subject: Re: Nutch not indexing full collection Nutch truncates content longer than configured and Solr truncates content exceeding max

RE: Nutch not indexing full collection

2011-07-28 Thread Chip Calhoun
And I'm still indexing with this command: bin/nutch crawl urls -dir crawl -depth 15 -topN 50 -Original Message- From: lewis john mcgibbney [mailto:lewis.mcgibb...@gmail.com] Sent: Wednesday, July 27, 2011 12:18 PM To: user@nutch.apache.org Subject: Re: Nutch not indexing full collection

Re: Nutch not indexing full collection

2011-07-27 Thread lewis john mcgibbney
: Nutch not indexing full collection Hi Chip, I would try running your scripts after setting the environment variable $NUTCH_HOME to nutch/runtime/local/NUTCH_HOME On Wed, Jul 20, 2011 at 4:01 PM, Chip Calhoun ccalh...@aip.org wrote: I've been working with $NUTCH_HOME/runtime/local/conf/nutch

RE: Nutch not indexing full collection

2011-07-25 Thread Chip Calhoun
, July 20, 2011 5:23 PM To: user@nutch.apache.org Subject: Re: Nutch not indexing full collection Hi Chip, I would try running your scripts after setting the environment variable $NUTCH_HOME to nutch/runtime/local/NUTCH_HOME On Wed, Jul 20, 2011 at 4:01 PM, Chip Calhoun ccalh...@aip.org wrote

Nutch not indexing full collection

2011-07-20 Thread Chip Calhoun
Hi, I'm using Nutch 1.3 to crawl a section of our website, and it doesn't seem to crawl the entire thing. I'm probably missing something simple, so I hope somebody can help me. My urls/nutch file contains a single URL: http://www.aip.org/history/ohilist/transcripts.html , which is an

Re: Nutch not indexing full collection

2011-07-20 Thread Julien Nioche
I'd have suspected db.max.outlinks.per.page but you seem to have set it up correctly. Are you running Nutch in runtime/local? in which case you modified nutch-site.xml in runtime/local/conf, right? nutch readdb -stats will give you the total number of pages known etc Julien On 20 July 2011

RE: Nutch not indexing full collection

2011-07-20 Thread Chip Calhoun
Subject: Re: Nutch not indexing full collection I'd have suspected db.max.outlinks.per.page but you seem to have set it up correctly. Are you running Nutch in runtime/local? in which case you modified nutch-site.xml in runtime/local/conf, right? nutch readdb -stats will give you the total number

Re: Nutch not indexing full collection

2011-07-20 Thread lewis john mcgibbney
, but I can no longer find any reference to it. Chip -Original Message- From: Julien Nioche [mailto:lists.digitalpeb...@gmail.com] Sent: Wednesday, July 20, 2011 10:06 AM To: user@nutch.apache.org Subject: Re: Nutch not indexing full collection I'd have suspected