That doesn't seem too unreasonable of a result count to me if you're
running local. Assuming you're partitioning via host, all of those URLs are
to the same host, and you have a 3 second politeness delay you should end
up w/ a crawl lasting
21497 * 3 / 60 / 60 = 17.9 hours
There's a wiki page on
[
https://issues.apache.org/jira/browse/NUTCH-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947573#comment-14947573
]
Hudson commented on NUTCH-2124:
---
SUCCESS: Integrated in Nutch-trunk #3287 (See
Sujen can you provide an example on the existing Scoring
Similarity wiki page of what the gold standard file
should have in it and how it should be formatted.
+
Chris Mattmann, Ph.D.
Adjunct Associate Professor, Computer Science Department
[
https://issues.apache.org/jira/browse/NUTCH-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947002#comment-14947002
]
Michael Joyce commented on NUTCH-2129:
--
Fixed the unnecessary init that [~jnioche] caught. Thanks
Hi guys,
I am trying to setup selenium plugin with this link:
https://github.com/apache/nutch/tree/trunk/src/plugin/protocol-selenium
When I execute this command
sudo /usr/bin/Xvfb :11 -screen 0 1024x768x24 &
The command line outputs a few lines of "Initializing built-in extension
xxx", and
Hi Mithun,
The goldstandard.txt is a file against which the parsed text of an html
page coming from nutch will be checked. There is no particular format for
that file, just plain text.
For example: If you were to score pages which were more similar to a topic
relating to Robotics, you would want
[
https://issues.apache.org/jira/browse/NUTCH-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-2124.
Resolution: Fixed
Assignee: Sebastian Nagel
Committed to trunk, r1707360. Thanks,
7 matches
Mail list logo