PowerPoint Parsing Exception

2009-03-12 Thread Bullard, Luke
Hi, I'm using Nutch 0.9 to crawl part of my intranet, and am getting the following when attempting to parse ppt files: 2009-03-11 16:30:47,000 ERROR mspowerpoint.ContentReaderListener - extractClientTextBoxes java.lang.ArrayIndexOutOfBoundsException: -55133188 at org.apache.poi.util.Littl

[jira] Created: (NUTCH-718) urlfilter-subnets plugin

2009-03-12 Thread Dmitry Lihachev (JIRA)
urlfilter-subnets plugin Key: NUTCH-718 URL: https://issues.apache.org/jira/browse/NUTCH-718 Project: Nutch Issue Type: New Feature Reporter: Dmitry Lihachev Priority: Minor This plugin filt

Re: planning for nutch-1.0-rc1

2009-03-12 Thread Bartosz Gadzimski
Hello Dennis, We'v been trying your new framework and indexer and everything looks better now. But we can't understand what should be output of last command (FieldIndexer). We have: u...@kubuntu:~/nutch-1.0$ ls crawl/indexes/part-0/ index.done segments_1 segments.gen .inde

[jira] Updated: (NUTCH-718) urlfilter-subnets plugin

2009-03-12 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-718: -- Attachment: NUTCH-718_urlfilter_subnets.patch {code} cd nutch-trunk patch -p0 < NUTCH-718_urlfil

[jira] Created: (NUTCH-719) fetchQueues.totalSize incorrect in Fetcher2

2009-03-12 Thread Julien Nioche (JIRA)
fetchQueues.totalSize incorrect in Fetcher2 --- Key: NUTCH-719 URL: https://issues.apache.org/jira/browse/NUTCH-719 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: 1.

[jira] Created: (NUTCH-720) site: search operator with no query term

2009-03-12 Thread Frank McCown (JIRA)
site: search operator with no query term Key: NUTCH-720 URL: https://issues.apache.org/jira/browse/NUTCH-720 Project: Nutch Issue Type: Improvement Affects Versions: 1.1 Reporter: Fran

[Nutch Wiki] Update of "NutchTutorial" by FrankMcCown

2009-03-12 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The following page has been changed by FrankMcCown: http://wiki.apache.org/nutch/NutchTutorial The comment on the change is: Clarified that NutchBean only searches the "crawl" dir.