3590 (See
[https://builds.apache.org/job/Nutch-trunk/3590/])
NUTCH-1842: crawl.gen.delay value is read incorrectly from config - add
(snagel:
[https://github.com/apache/nutch/commit/a37bde1c03bd355c25edf6a240bac6079cb3cdc7])
* (edit) CHANGES.txt
> crawl.gen.delay has a wrong default value i
3589 (See
[https://builds.apache.org/job/Nutch-trunk/3589/])
NUTCH-1842: crawl.gen.delay value is read incorrectly from (github:
[https://github.com/apache/nutch/commit/8b7298da1f04ade38f986b225134345456f07c32])
* (edit) src/java/org/apache/nutch/crawl/Generator.java
> crawl.gen.delay has
[
https://issues.apache.org/jira/browse/NUTCH-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-1842.
Resolution: Fixed
Merged [~yossi]'s PR. Thanks everyone!
> crawl.gen.delay has
#393: NUTCH-1842: crawl.gen.delay value is
read incorrectly from config
URL: https://github.com/apache/nutch/pull/393
This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:
As this is a foreign pull req
e to the documentation is the better
decision. Would be different if the description was wrong only for a short time
but now it's already 8 years. Are there any objections? Otherwise I would merge
the PR and also add a warning to the change log.
> crawl.gen.delay has a wrong default va
hose to change the code is that I prefer
to "surprise" advanced users (who look in the code), than regular users. But I
can also see the counter-argument ("basic users have never read/changed this
property, so they won't notice if we change it").
> crawl.gen.delay ha
this back to the agenda and thanks for the PR.
I looked into the source code history: the implementation always used days for
{{crawl.gen.delay}} (since
[71c0cae|https://github.com/apache/nutch/commit/71c0cae1190c7176e5ff68bbf5236e22c1a1f723]),
the "wrong" description in nutch-defaul
[
https://issues.apache.org/jira/browse/NUTCH-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel reassigned NUTCH-1842:
--
Assignee: Sebastian Nagel
> crawl.gen.delay has a wrong default value in nu
[
https://issues.apache.org/jira/browse/NUTCH-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1842:
---
Fix Version/s: 1.16
> crawl.gen.delay has a wrong default value in nutch-default.xml or
s a minor issue for 2.X, I think it's a major one for 1.X: if
a segment fails, the pages that where in it will never be fetched.
> crawl.gen.delay has a wrong default value in nutch-default.xml or is being
>
X, I think it's a major one for 1.X: if
a segment fails, the pages that where in it will never be fetched.
> crawl.gen.delay has a wrong default value in nutch-default.xml or is being
> par
uest #393: NUTCH-1842: crawl.gen.delay value
is read incorrectly from config
URL: https://github.com/apache/nutch/pull/393
The documentation in nutch-default.xml says this value is in milliseconds,
but the code assumes it is in
e is not properly set. It only takes effect when combined with
property generate.update.crawldb.
* (2.x) crawl.gen.delay is not used and should be removed from
nutch-default.xml and GeneratorJob.java
> crawl.gen.delay has a wrong default value in nutch-default.xml or is being
> parsed i
ence to {{GENERATOR_DELAY}} is (from grep):
{quote}
java/org/apache/nutch/crawl/GeneratorJob.java: public static final String
GENERATOR_DELAY = "crawl.gen.delay";
{quote}
same for {{crawl.gen.delay}}.
I conclude that it is just some remains from older times. But I am just an user
kaveh minooie created NUTCH-1842:
Summary: crawl.gen.delay has a wrong default value in
nutch-default.xml or is being parsed incorrectly
Key: NUTCH-1842
URL: https://issues.apache.org/jira/browse/NUTCH-1842
yes, it is used in Nutch 1.x , but never used in Nutch 2.x. because in
Nutch 2.x it will never generate selected url.
the correct expression of crawl.gen.crawl is milliseconds you can check the
Nutch 1.x nutch-default.xml. the property description like this:
crawl.gen.delay
60480
is 'crawl.gen.delay' still being used anywhere? cause I can't find
anything in the source code except for here:
package org.apache.nutch.crawl;
public class GeneratorJob extends NutchTool implements Tool {
public static final String GENERATOR_TOP_N = "generate.topN&quo
17 matches
Mail list logo