[
https://issues.apache.org/jira/browse/NUTCH-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2229:
-
Patch Info: Patch Available
Description:
CrawlDatum allows Jexl expressions on its metadata
Markus Jelsma created NUTCH-2229:
Summary: Allow Jexl expressions on CrawlDatum's fixed attributes
Key: NUTCH-2229
URL: https://issues.apache.org/jira/browse/NUTCH-2229
Project: Nutch
Issue
[
https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2227.
--
Resolution: Fixed
Committed to trunk in revision 1731849.
> RegexParseFil
[
https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2227:
-
Attachment: NUTCH-2227.patch
Updated patch. conf/regex-parsefilter.txt was missing in the patch
[
https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158808#comment-15158808
]
Markus Jelsma edited comment on NUTCH-2227 at 2/23/16 12:45 PM:
Updated
[
https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2227:
-
Attachment: NUTCH-2227.patch
Updated patch. It now includes package-info.java. Will commit
[
https://issues.apache.org/jira/browse/NUTCH-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2216:
-
Attachment: NUTCH-2216.patch
Updated patch for trunk. And included second and third comments
[
https://issues.apache.org/jira/browse/NUTCH-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2221.
--
Resolution: Fixed
Assignee: Markus Jelsma
> Introduce db.ignore.internal.li
[
https://issues.apache.org/jira/browse/NUTCH-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158691#comment-15158691
]
Markus Jelsma commented on NUTCH-2221:
--
Committed to trunk in revision 1731836.
> Introd
[
https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158684#comment-15158684
]
Markus Jelsma edited comment on NUTCH-2144 at 2/23/16 10:39 AM
[
https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158684#comment-15158684
]
Markus Jelsma commented on NUTCH-2144:
--
ParseOutputFormat.filterNormalize() signature has changed
[
https://issues.apache.org/jira/browse/NUTCH-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2221:
-
Attachment: NUTCH-2221.patch
Updated patch for current trunk revision. Will commit shortly
[
https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2220:
-
Description:
We need an option db.ignore.internal.links that operates in FetcherThread, just
[
https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2220:
-
Description:
We need an option db.ignore.internal.links that operates in FetcherThread, just
[
https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158651#comment-15158651
]
Markus Jelsma edited comment on NUTCH-2220 at 2/23/16 10:04 AM:
Yes, i
[
https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158651#comment-15158651
]
Markus Jelsma commented on NUTCH-2220:
--
Yes, i would opt for an incompatibility note at the top
[
https://issues.apache.org/jira/browse/NUTCH-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2228:
-
Summary: Plugin index-replace unit test broken on Java 8 (was:
index-replace unit test fails
[
https://issues.apache.org/jira/browse/NUTCH-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158633#comment-15158633
]
Markus Jelsma commented on NUTCH-2228:
--
Ah i see! Your patch addresses the problem nicely. I'll
[
https://issues.apache.org/jira/browse/NUTCH-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reassigned NUTCH-2228:
Assignee: Markus Jelsma
> index-replace unit test fa
Markus Jelsma created NUTCH-2228:
Summary: index-replace unit test fails
Key: NUTCH-2228
URL: https://issues.apache.org/jira/browse/NUTCH-2228
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on NUTCH-2227 stopped by Markus Jelsma.
> RegexParseFilter
>
>
> Key
[
https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2227:
-
Attachment: NUTCH-2227.patch
Updated patch, added negative test. Which works. Will commit
[
https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2227:
-
Attachment: NUTCH-2227.patch
Updated patch, build.xml was missing
> RegexParseFil
[
https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2227:
-
Attachment: NUTCH-2227.patch
Patch for trunk! Tests pass.
> RegexParseFil
[
https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on NUTCH-2227 started by Markus Jelsma.
> RegexParseFilter
>
>
> Key
[
https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2227:
-
Description:
A parse filter that takes a regex and a field name. If regex matches via
Markus Jelsma created NUTCH-2227:
Summary: RegexParseFilter
Key: NUTCH-2227
URL: https://issues.apache.org/jira/browse/NUTCH-2227
Project: Nutch
Issue Type: New Feature
Components
[
https://issues.apache.org/jira/browse/NUTCH-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2219:
-
Fix Version/s: 1.12
> Criteria order to be configurable in Deduplication
[
https://issues.apache.org/jira/browse/NUTCH-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2219:
-
Affects Version/s: 1.11
> Criteria order to be configurable in Deduplication
[
https://issues.apache.org/jira/browse/NUTCH-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2219.
--
Resolution: Fixed
Committed to trunk in revision 1731651. Thanks Ron van der Vegt
> Crite
[
https://issues.apache.org/jira/browse/NUTCH-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157027#comment-15157027
]
Markus Jelsma commented on NUTCH-2226:
--
Hello - how is this related? Are you using trunk? We run
[
https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156711#comment-15156711
]
Markus Jelsma commented on NUTCH-2220:
--
Any comments to this change, e.g. separate db and linkdb
Can someone please put up a small howto somewhere? I need to know how to:
* check out trunk
* check out a specific tag
* do a svn up
* create a patch, e.g. svn diff
* perform a commit
Thanks,
Markus
-Original message-
> From:Mattmann, Chris A (3980)
>
[
https://issues.apache.org/jira/browse/NUTCH-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2219:
-
Description:
Current implementation:
"This command takes a path to a crawldb as para
[
https://issues.apache.org/jira/browse/NUTCH-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2219:
-
Summary: Criteria order to be configurable in DeduplicationJob (was: Dedup
script, allow users
[
https://issues.apache.org/jira/browse/NUTCH-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2219:
-
Attachment: NUTCH-2219.patch
Thanks, looks fine!
Slightly updated patch:
* changed usage output
[
https://issues.apache.org/jira/browse/NUTCH-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reassigned NUTCH-2219:
Assignee: Markus Jelsma
> Dedup script, allow users to change the order in which m
[
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152221#comment-15152221
]
Markus Jelsma commented on NUTCH-2191:
--
1. although that could work, it does not truely resolve
[
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152184#comment-15152184
]
Markus Jelsma edited comment on NUTCH-2191 at 2/18/16 11:34 AM:
1. ah yes
[
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152184#comment-15152184
]
Markus Jelsma commented on NUTCH-2191:
--
1. ah yes,we still need to fix this crazy plugin dependency
[
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152141#comment-15152141
]
Markus Jelsma commented on NUTCH-2191:
--
Hello Kshijtij - well no, certainly not at this time
[
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152140#comment-15152140
]
Markus Jelsma commented on NUTCH-2191:
--
Hi - it works indeed. But new problems appear, as usual!
1
[
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151154#comment-15151154
]
Markus Jelsma commented on NUTCH-2191:
--
Hi Karanjeet - looks like the only changes you made
[
https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2223.
--
Resolution: Fixed
Committed to trunk in revision 1730808.
> Upgrade xercesImpl to 2.1
[
https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150264#comment-15150264
]
Markus Jelsma commented on NUTCH-2223:
--
Thanks Tien Nguyen Manh!
> Upgrade xercesImpl to 2.1
[
https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150248#comment-15150248
]
Markus Jelsma commented on NUTCH-2223:
--
Incredible, i tried the tika-breaker.html file in the linked
[
https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reassigned NUTCH-2223:
Assignee: Markus Jelsma
> Upgrade xercesImpl to 2.11.0 to fix hang on issue in t
[
https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2223:
-
Priority: Major (was: Minor)
> Upgrade xercesImpl to 2.11.0 to fix hang on issue in t
[
https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2223:
-
Description:
Stracktrace for the hang seems to be:
{code
[
https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2223:
-
Fix Version/s: 1.12
> Upgrade xercesImpl to 2.11.0 to fix hang on issue in tika mimet
[
https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2223:
-
Description:
{code}Stracktrace for the hang seems
[
https://issues.apache.org/jira/browse/NUTCH-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2224:
-
Component/s: fetcher
> Average bytes/second calculated incorrectly in fetc
[
https://issues.apache.org/jira/browse/NUTCH-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2224:
-
Affects Version/s: 1.11
> Average bytes/second calculated incorrectly in fetc
[
https://issues.apache.org/jira/browse/NUTCH-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2224:
-
Fix Version/s: 1.12
> Average bytes/second calculated incorrectly in fetc
[
https://issues.apache.org/jira/browse/NUTCH-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2224.
--
Resolution: Fixed
Committed to trunk in revision 1730803. Thanks Tien Nguyen Manh!
> Aver
[
https://issues.apache.org/jira/browse/NUTCH-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2224:
-
Summary: Average bytes/second calculated incorrectly in fetcher (was:
Wrong metric compute
[
https://issues.apache.org/jira/browse/NUTCH-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reassigned NUTCH-2224:
Assignee: Markus Jelsma
> Wrong metric compute in Fetcher status rep
[
https://issues.apache.org/jira/browse/NUTCH-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2225.
--
Resolution: Fixed
Committed to trunk in revision 1730802. Thanks Tien Nguyen Manh!
> Par
[
https://issues.apache.org/jira/browse/NUTCH-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2225:
-
Summary: Parsed time calculated incorrectly (was: Parsed time not include
time to parse
[
https://issues.apache.org/jira/browse/NUTCH-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reassigned NUTCH-2225:
Assignee: Markus Jelsma
> Parsed time not include time to pa
[
https://issues.apache.org/jira/browse/NUTCH-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2225:
-
Affects Version/s: 1.11
> Parsed time not include time to pa
[
https://issues.apache.org/jira/browse/NUTCH-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2225:
-
Fix Version/s: 1.12
> Parsed time not include time to pa
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-961.
-
Resolution: Fixed
Committed to trunk in revision 1730694. Thanks everyone for contributions
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Attachment: NUTCH-961.patch
Updated patch. ExtractorRepository was missing.
> Expose Tik
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Fix Version/s: 1.12
> Expose Tika's boilerpipe supp
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Affects Version/s: 1.11
> Expose Tika's boilerpipe supp
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148642#comment-15148642
]
Markus Jelsma commented on NUTCH-961:
-
Tests pass as expected and Boilerpipe as well. Will commit
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Description:
Tika 0.8 comes with the Boilerpipe content handler which can be used to extract
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Description:
Tika 0.8 comes with the Boilerpipe content handler which can be used to extract
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Attachment: NUTCH-961.patch
Patch for trunk.
> Expose Tika's boilerpipe supp
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-1233.
--
Resolution: Fixed
Committed to trunk in revision 1730687.
> Rely on Tika for outl
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1233:
-
Affects Version/s: 1.11
> Rely on Tika for outlink extract
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1233:
-
Fix Version/s: 1.12
> Rely on Tika for outlink extract
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1233:
-
Component/s: parser
> Rely on Tika for outlink extract
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148590#comment-15148590
]
Markus Jelsma commented on NUTCH-1233:
--
Awesome! Everything works as expected since the Tika 1.12
[
https://issues.apache.org/jira/browse/NUTCH-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2210.
--
Resolution: Fixed
Committed to trunk in revision 1730686.
> Upgrade to Tika 1
[
https://issues.apache.org/jira/browse/NUTCH-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148572#comment-15148572
]
Markus Jelsma commented on NUTCH-2210:
--
Test passes, will commit shortly.
> Upgrade to Tika 1
[
https://issues.apache.org/jira/browse/NUTCH-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2210:
-
Attachment: NUTCH-2210.patch
Patch for trunk.
> Upgrade to Tika 1
[
https://issues.apache.org/jira/browse/NUTCH-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148489#comment-15148489
]
Markus Jelsma commented on NUTCH-2197:
--
Hello Arun - no, this is not applied to 2.3.1. The plugins
[
https://issues.apache.org/jira/browse/NUTCH-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15147735#comment-15147735
]
Markus Jelsma commented on NUTCH-2210:
--
Apache Tika 1.12 is available. Will upgrade as soon
[
https://issues.apache.org/jira/browse/NUTCH-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2221:
-
Attachment: NUTCH-2216-NUTCH-2220-NUTCH-2221.patch
Patch for trunk. This includes all three
[
https://issues.apache.org/jira/browse/NUTCH-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2216:
-
Attachment: NUTCH-2216.patch
Patch for trunk, introducing db.ignore.treat.redirects.as.links
[
https://issues.apache.org/jira/browse/NUTCH-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2216:
-
Summary: db.ignore.*.links to optionally follow internal redirects (was:
ignore.internal.links
[
https://issues.apache.org/jira/browse/NUTCH-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2221:
-
Attachment: NUTCH-2221.patch
Patch for trunk. This includes the modified config of NUTCH-2220
[
https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2220:
-
Patch Info: Patch Available
> Rename db.* options used only by the linkdb to lin
[
https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2220:
-
Attachment: NUTCH-2220.patch
Patch for trunk
> Rename db.* options used only by the lin
[
https://issues.apache.org/jira/browse/NUTCH-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2221:
-
Description:
FetcherThread has support for db.ignore.external.links. In config you can find
[
https://issues.apache.org/jira/browse/NUTCH-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2221:
-
Summary: Introduce db.ignore.internal.links to FetcherThread (was:
Introduce
Markus Jelsma created NUTCH-2221:
Summary: Introduce db.ignore.external.links to FetcherThread
Key: NUTCH-2221
URL: https://issues.apache.org/jira/browse/NUTCH-2221
Project: Nutch
Issue Type
Markus Jelsma created NUTCH-2220:
Summary: Rename db.* options used only by the linkdb to linkdb.*
Key: NUTCH-2220
URL: https://issues.apache.org/jira/browse/NUTCH-2220
Project: Nutch
Issue
[
https://issues.apache.org/jira/browse/NUTCH-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2189.
--
Resolution: Fixed
> Domain filter must deactivate if no rules are pres
[
https://issues.apache.org/jira/browse/NUTCH-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2189:
-
Fix Version/s: 1.12
> Domain filter must deactivate if no rules are pres
[
https://issues.apache.org/jira/browse/NUTCH-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2189:
-
Affects Version/s: 1.11
> Domain filter must deactivate if no rules are pres
[
https://issues.apache.org/jira/browse/NUTCH-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reopened NUTCH-2189:
--
Fix version missing
> Domain filter must deactivate if no rules are pres
[
https://issues.apache.org/jira/browse/NUTCH-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-2189.
> Domain filter must deactivate if no rules are pres
Markus Jelsma created NUTCH-2216:
Summary: ignore.internal.links to optionally follow internal
redirects
Key: NUTCH-2216
URL: https://issues.apache.org/jira/browse/NUTCH-2216
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144463#comment-15144463
]
Markus Jelsma commented on NUTCH-2216:
--
Apparently db.ignore.internal.links is not implemented
[
https://issues.apache.org/jira/browse/NUTCH-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144497#comment-15144497
]
Markus Jelsma commented on NUTCH-2216:
--
Additionally, it probably should not be implemented because
[
https://issues.apache.org/jira/browse/NUTCH-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144518#comment-15144518
]
Markus Jelsma commented on NUTCH-2216:
--
An option is to change the default
[
https://issues.apache.org/jira/browse/NUTCH-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2215:
-
Attachment: NUTCH-2215.patch
Tiny error in nutch-default description.
> Generator to restr
701 - 800 of 3217 matches
Mail list logo