Hi,
Could you please submit a JIRA issue and attach this (or perhaps the
diff for whole plugin exluding the jcifs .jar because it is lgpl) in it.
René Treffer wrote:
Hi,
I've just written an protocol-smb, it's really simple (code attached).
It uses the jcifs lib and seems to work - but
Michael Wechner wrote:
Hi
It seems to me that Nutch does not send a HTTP Accept Header. Is that on
purpose?
I would have expected that Nutch tells the server which mime-types it
accepts resp. is able to parse and index,
but maybe I misunderstand something.
This sound like a good addition
Chris Mattmann wrote:
Hi Guys,
I've seen on the Hadoop mailing list recently that there was a new status
added for issues in JIRA called Patch Available to let committers know
that a patch is ready for review to commit. How about we add this to the
Nutch jira instance as well?
+1
I tried
Sami Siren wrote:
Michael Wechner wrote:
Hi
It seems to me that Nutch does not send a HTTP Accept Header. Is that
on purpose?
I would have expected that Nutch tells the server which mime-types it
accepts resp. is able to parse and index,
but maybe I misunderstand something.
This
Benjamin Higgins wrote:
I was taking a look at HtmlParser.java, and I think the fix to NUTCH-17 was
accidentally removed. See:
http://svn.apache.org/viewvc/lucene/nutch/tags/release-0.8/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java?view=log
Specifically, in
[ http://issues.apache.org/jira/browse/NUTCH-348?page=all ]
Andrzej Bialecki closed NUTCH-348.
---
Resolution: Fixed
Patch applied, both to branch-0.8 and trunk. Thanks!
Generator is building fetch list using *lowest* scoring URLs
urls blocked db.fetch.retry.max * http.max.delays times during fetching are
marked as STATUS_DB_GONE
--
Key: NUTCH-350
URL:
[ http://issues.apache.org/jira/browse/NUTCH-350?page=all ]
Stefan Groschupf updated NUTCH-350:
---
Attachment: protocolRetryV5.patch
This patch will dramatically increase the number of successfully fetched pages
of a intranet crawl over the time.
Protocol forward proxy
--
Key: NUTCH-351
URL: http://issues.apache.org/jira/browse/NUTCH-351
Project: Nutch
Issue Type: New Feature
Components: fetcher
Affects Versions: 0.8, 0.8.1, 0.9.0
Hi Chris,
Chris Stephens wrote:
I think I finally have my plugin ported to 0.8, however I cannot get
my plugin to load.
My plugin.includes file in conf/nutch-site.xml has the following for
its plugin.includes value:
I have this line in src/plugin/build.xml under the deploy section:
ant dir=custom-meta target=deploy /
The plugin is compiling ok. I spent several days getting errors on
compile and investing how to port them to 0.8.
Jonathan Addison wrote:
Hi Chris,
Chris Stephens wrote:
I think I
Did you check if your plugin.xml is read by putting the plugin package
in debug mode?
(put this in the log4j.properties)
log4j.logger.org.apache.nutch.plugin=DEBUG
-Original Message-
From: Chris Stephens [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 17, 2006 2:30 PM
To:
Chris Stephens wrote:
I have this line in src/plugin/build.xml under the deploy section:
ant dir=custom-meta target=deploy /
The plugin is compiling ok. I spent several days getting errors on
compile and investing how to port them to 0.8.
Ok, I've also added nutch-extensionpoints to the
Add jar command to bin/nutch to allow launching hadoop job jars
---
Key: NUTCH-352
URL: http://issues.apache.org/jira/browse/NUTCH-352
Project: Nutch
Issue Type: New Feature
[ http://issues.apache.org/jira/browse/NUTCH-352?page=all ]
David Cathcart updated NUTCH-352:
-
Description: Add the ability to run hadoop job jars via bin/nutch jar
jobjar.jar. See attachment for patch. (was: Add the ability to run hadoop job
jars via
Its definitely not trying to load my plugin, I added that debug setting
and didn't see anything regarding my plugin. One thing I noticed is
that my plugin is not in the plugins directory. At what point do the
plugs get copied there? Here is the output from my compile:
compile:
[echo]
Hi Chris,
It seems from your email message that your plugin is located in
$NUTCH_HOME/build/custom-meta? Is this where your plugin * code * is
currently stored? If so, this is the wrong location and the most likely
reason that your plugin isn't being loaded.
Plugin code should live in
My code currently resides in $NUTCH_HOME/src/plugin/custom-meta . The
output I pasted was from ant, I'm not sure what it does with the build
directory. I do know that custom-meta never shows up in $NUTCH_HOME/plugin.
Chris Mattmann wrote:
Hi Chris,
It seems from your email message that
[
http://issues.apache.org/jira/browse/NUTCH-322?page=comments#action_12428858 ]
Stefan Groschupf commented on NUTCH-322:
I think this is a serious problem. Page A server side redirect to Page B. Page
A is never writen to the output.
pages that serverside forwards will be refetched every time
---
Key: NUTCH-353
URL: http://issues.apache.org/jira/browse/NUTCH-353
Project: Nutch
Issue Type: Bug
Affects Versions:
[ http://issues.apache.org/jira/browse/NUTCH-353?page=all ]
Stefan Groschupf updated NUTCH-353:
---
Attachment: doNotRefecthForwarderPagesV1.patch
Since we discussed that nutch need to be more polite we should fix that asap.
pages that serverside
[ http://issues.apache.org/jira/browse/NUTCH-322?page=all ]
Stefan Groschupf resolved NUTCH-322.
Resolution: Duplicate
duplicate of NUTCH-353
Fetcher discards ProtocolStatus, doesn't store redirected pages
[
http://issues.apache.org/jira/browse/NUTCH-347?page=comments#action_12428915 ]
Stefan Groschupf commented on NUTCH-347:
Please submit this patch!
Thanks!
Build: plugins' Jars not found
--
[
http://issues.apache.org/jira/browse/NUTCH-346?page=comments#action_12428917 ]
Stefan Groschupf commented on NUTCH-346:
+1
I agree, can you please create a patch file and attach it to this bug.
Thanks
Improve readability of
[
http://issues.apache.org/jira/browse/NUTCH-345?page=comments#action_12428918 ]
Stefan Groschupf commented on NUTCH-345:
Shouldn't the DeflateUtils also be part of the protocol-http plugin?
Also since it is a larger contribution and
25 matches
Mail list logo