Re: Nutch, samba and urls...

2006-08-17 Thread Sami Siren
Hi, Could you please submit a JIRA issue and attach this (or perhaps the diff for whole plugin exluding the jcifs .jar because it is lgpl) in it. René Treffer wrote: Hi, I've just written an protocol-smb, it's really simple (code attached). It uses the jcifs lib and seems to work - but

Re: HTTP Accept Header seems to be missing

2006-08-17 Thread Sami Siren
Michael Wechner wrote: Hi It seems to me that Nutch does not send a HTTP Accept Header. Is that on purpose? I would have expected that Nutch tells the server which mime-types it accepts resp. is able to parse and index, but maybe I misunderstand something. This sound like a good addition

Re: Patch Available status?

2006-08-17 Thread Sami Siren
Chris Mattmann wrote: Hi Guys, I've seen on the Hadoop mailing list recently that there was a new status added for issues in JIRA called Patch Available to let committers know that a patch is ready for review to commit. How about we add this to the Nutch jira instance as well? +1 I tried

Re: HTTP Accept Header seems to be missing

2006-08-17 Thread Michael Wechner
Sami Siren wrote: Michael Wechner wrote: Hi It seems to me that Nutch does not send a HTTP Accept Header. Is that on purpose? I would have expected that Nutch tells the server which mime-types it accepts resp. is able to parse and index, but maybe I misunderstand something. This

Re: Neko parsing fix inadvertently reverted?

2006-08-17 Thread Sami Siren
Benjamin Higgins wrote: I was taking a look at HtmlParser.java, and I think the fix to NUTCH-17 was accidentally removed. See: http://svn.apache.org/viewvc/lucene/nutch/tags/release-0.8/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java?view=log Specifically, in

[jira] Closed: (NUTCH-348) Generator is building fetch list using *lowest* scoring URLs

2006-08-17 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-348?page=all ] Andrzej Bialecki closed NUTCH-348. --- Resolution: Fixed Patch applied, both to branch-0.8 and trunk. Thanks! Generator is building fetch list using *lowest* scoring URLs

[jira] Created: (NUTCH-350) urls blocked db.fetch.retry.max * http.max.delays times during fetching are marked as STATUS_DB_GONE

2006-08-17 Thread Stefan Groschupf (JIRA)
urls blocked db.fetch.retry.max * http.max.delays times during fetching are marked as STATUS_DB_GONE -- Key: NUTCH-350 URL:

[jira] Updated: (NUTCH-350) urls blocked db.fetch.retry.max * http.max.delays times during fetching are marked as STATUS_DB_GONE

2006-08-17 Thread Stefan Groschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-350?page=all ] Stefan Groschupf updated NUTCH-350: --- Attachment: protocolRetryV5.patch This patch will dramatically increase the number of successfully fetched pages of a intranet crawl over the time.

[jira] Created: (NUTCH-351) Protocol forward proxy

2006-08-17 Thread Sami Siren (JIRA)
Protocol forward proxy -- Key: NUTCH-351 URL: http://issues.apache.org/jira/browse/NUTCH-351 Project: Nutch Issue Type: New Feature Components: fetcher Affects Versions: 0.8, 0.8.1, 0.9.0

Re: 0.8 not loading plugins

2006-08-17 Thread Jonathan Addison
Hi Chris, Chris Stephens wrote: I think I finally have my plugin ported to 0.8, however I cannot get my plugin to load. My plugin.includes file in conf/nutch-site.xml has the following for its plugin.includes value:

Re: 0.8 not loading plugins

2006-08-17 Thread Chris Stephens
I have this line in src/plugin/build.xml under the deploy section: ant dir=custom-meta target=deploy / The plugin is compiling ok. I spent several days getting errors on compile and investing how to port them to 0.8. Jonathan Addison wrote: Hi Chris, Chris Stephens wrote: I think I

RE: 0.8 not loading plugins

2006-08-17 Thread HUYLEBROECK Jeremy RD-ILAB-SSF
Did you check if your plugin.xml is read by putting the plugin package in debug mode? (put this in the log4j.properties) log4j.logger.org.apache.nutch.plugin=DEBUG -Original Message- From: Chris Stephens [mailto:[EMAIL PROTECTED] Sent: Thursday, August 17, 2006 2:30 PM To:

Re: 0.8 not loading plugins

2006-08-17 Thread Jonathan Addison
Chris Stephens wrote: I have this line in src/plugin/build.xml under the deploy section: ant dir=custom-meta target=deploy / The plugin is compiling ok. I spent several days getting errors on compile and investing how to port them to 0.8. Ok, I've also added nutch-extensionpoints to the

[jira] Created: (NUTCH-352) Add jar command to bin/nutch to allow launching hadoop job jars

2006-08-17 Thread David Cathcart (JIRA)
Add jar command to bin/nutch to allow launching hadoop job jars --- Key: NUTCH-352 URL: http://issues.apache.org/jira/browse/NUTCH-352 Project: Nutch Issue Type: New Feature

[jira] Updated: (NUTCH-352) Add jar command to bin/nutch to allow launching hadoop job jars

2006-08-17 Thread David Cathcart (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-352?page=all ] David Cathcart updated NUTCH-352: - Description: Add the ability to run hadoop job jars via bin/nutch jar jobjar.jar. See attachment for patch. (was: Add the ability to run hadoop job jars via

Re: 0.8 not loading plugins

2006-08-17 Thread Chris Stephens
Its definitely not trying to load my plugin, I added that debug setting and didn't see anything regarding my plugin. One thing I noticed is that my plugin is not in the plugins directory. At what point do the plugs get copied there? Here is the output from my compile: compile: [echo]

Re: 0.8 not loading plugins

2006-08-17 Thread Chris Mattmann
Hi Chris, It seems from your email message that your plugin is located in $NUTCH_HOME/build/custom-meta? Is this where your plugin * code * is currently stored? If so, this is the wrong location and the most likely reason that your plugin isn't being loaded. Plugin code should live in

Re: 0.8 not loading plugins

2006-08-17 Thread Chris Stephens
My code currently resides in $NUTCH_HOME/src/plugin/custom-meta . The output I pasted was from ant, I'm not sure what it does with the build directory. I do know that custom-meta never shows up in $NUTCH_HOME/plugin. Chris Mattmann wrote: Hi Chris, It seems from your email message that

[jira] Commented: (NUTCH-322) Fetcher discards ProtocolStatus, doesn't store redirected pages

2006-08-17 Thread Stefan Groschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-322?page=comments#action_12428858 ] Stefan Groschupf commented on NUTCH-322: I think this is a serious problem. Page A server side redirect to Page B. Page A is never writen to the output.

[jira] Created: (NUTCH-353) pages that serverside forwards will be refetched every time

2006-08-17 Thread Stefan Groschupf (JIRA)
pages that serverside forwards will be refetched every time --- Key: NUTCH-353 URL: http://issues.apache.org/jira/browse/NUTCH-353 Project: Nutch Issue Type: Bug Affects Versions:

[jira] Updated: (NUTCH-353) pages that serverside forwards will be refetched every time

2006-08-17 Thread Stefan Groschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-353?page=all ] Stefan Groschupf updated NUTCH-353: --- Attachment: doNotRefecthForwarderPagesV1.patch Since we discussed that nutch need to be more polite we should fix that asap. pages that serverside

[jira] Resolved: (NUTCH-322) Fetcher discards ProtocolStatus, doesn't store redirected pages

2006-08-17 Thread Stefan Groschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-322?page=all ] Stefan Groschupf resolved NUTCH-322. Resolution: Duplicate duplicate of NUTCH-353 Fetcher discards ProtocolStatus, doesn't store redirected pages

[jira] Commented: (NUTCH-347) Build: plugins' Jars not found

2006-08-17 Thread Stefan Groschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-347?page=comments#action_12428915 ] Stefan Groschupf commented on NUTCH-347: Please submit this patch! Thanks! Build: plugins' Jars not found --

[jira] Commented: (NUTCH-346) Improve readability of logs/hadoop.log

2006-08-17 Thread Stefan Groschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-346?page=comments#action_12428917 ] Stefan Groschupf commented on NUTCH-346: +1 I agree, can you please create a patch file and attach it to this bug. Thanks Improve readability of

[jira] Commented: (NUTCH-345) Add support for Content-Encoding: deflated

2006-08-17 Thread Stefan Groschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-345?page=comments#action_12428918 ] Stefan Groschupf commented on NUTCH-345: Shouldn't the DeflateUtils also be part of the protocol-http plugin? Also since it is a larger contribution and