[jira] [Updated] (NUTCH-2582) Set pool size of XML SAX parsers used for MIME detection in Tika 1.19

2019-11-12 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2582: --- Fix Version/s: 1.17 > Set pool size of XML SAX parsers used for MIME detection in Tika 1.19 >

[jira] [Resolved] (NUTCH-2603) Bring back legacy pre-Tika parsers and use them as back up parsers

2019-11-12 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2603. Resolution: Won't Fix > Bring back legacy pre-Tika parsers and use them as back up parsers

[jira] [Updated] (NUTCH-2599) charset detection issue with parse-tika

2019-11-12 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2599: --- Fix Version/s: 1.17 > charset detection issue with parse-tika > -

[jira] [Updated] (NUTCH-2634) Some links marked as "nofollow" are followed anyway.

2019-11-12 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2634: --- Fix Version/s: 1.17 > Some links marked as "nofollow" are followed anyway. >

[jira] [Updated] (NUTCH-2608) Reduce size of Nutch job file and package

2019-11-12 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2608: --- Fix Version/s: 1.17 > Reduce size of Nutch job file and package > ---

[jira] [Updated] (NUTCH-2662) index-jexl-filter plugin throws a RuntimeException if its enabled but not configured

2019-11-12 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2662: --- Fix Version/s: 1.17 > index-jexl-filter plugin throws a RuntimeException if its enabled but n

[jira] [Updated] (NUTCH-2720) ROBOTS metatag ignored when capitalized

2019-11-12 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2720: --- Fix Version/s: 1.17 > ROBOTS metatag ignored when capitalized > -

[jira] [Updated] (NUTCH-2750) Improve CrawlDbReader & LinkDbReader reader handling

2019-11-12 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2750: --- Fix Version/s: 1.17 > Improve CrawlDbReader & LinkDbReader reader handling >

[jira] [Resolved] (NUTCH-2750) Improve CrawlDbReader & LinkDbReader reader handling

2019-11-12 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2750. Resolution: Fixed Merged into 1.x/master. Thanks, [~jurian]! > Improve CrawlDbReader & Lin

[jira] [Updated] (NUTCH-2750) Improve CrawlDbReader & LinkDbReader reader handling

2019-11-12 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2750: --- Summary: Improve CrawlDbReader & LinkDbReader reader handling (was: improve CrawlDbReader &

[jira] [Created] (NUTCH-2753) Add -listen option to command-line help of CrawlDbReader and LinkDbReader

2019-11-09 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2753: -- Summary: Add -listen option to command-line help of CrawlDbReader and LinkDbReader Key: NUTCH-2753 URL: https://issues.apache.org/jira/browse/NUTCH-2753 Project:

[jira] [Commented] (NUTCH-2748) Fetch status gone (redirect exceeded) not to overwrite existing items in CrawlDb

2019-11-08 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16970612#comment-16970612 ] Sebastian Nagel commented on NUTCH-2748: Opened [PR#485|https://github.com/apache

[jira] [Commented] (NUTCH-2748) Fetch status gone (redirect exceeded) not to overwrite existing items in CrawlDb

2019-11-08 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16970155#comment-16970155 ] Sebastian Nagel commented on NUTCH-2748: Hi [~markus17], already working on a pat

[jira] [Commented] (NUTCH-2748) Fetch status gone (redirect exceeded) not to overwrite existing items in CrawlDb

2019-11-08 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16970107#comment-16970107 ] Sebastian Nagel commented on NUTCH-2748: Attached sample CrawlDb and segment to r

[jira] [Updated] (NUTCH-2748) Fetch status gone (redirect exceeded) not to overwrite existing items in CrawlDb

2019-11-08 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2748: --- Attachment: test-NUTCH-2748.zip > Fetch status gone (redirect exceeded) not to overwrite exis

[jira] [Updated] (NUTCH-1337) WebGraph to follow redirects

2019-11-07 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1337: --- Fix Version/s: 1.17 > WebGraph to follow redirects > > >

[jira] [Updated] (NUTCH-1337) WebGraph to follow redirects

2019-11-07 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1337: --- Component/s: webgraph scoring > WebGraph to follow redirects > -

[jira] [Resolved] (NUTCH-1559) parse-metatags duplicates extracted metatags

2019-11-07 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1559. Resolution: Fixed Fixed in master/1.x - thanks, everybody! > parse-metatags duplicates ext

[jira] [Updated] (NUTCH-1559) parse-metatags duplicates extracted metatags

2019-11-07 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1559: --- Component/s: plugin > parse-metatags duplicates extracted metatags >

[jira] [Updated] (NUTCH-2746) Basic URL normalizer to normalize Unicode domain names

2019-11-07 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2746: --- Component/s: plugin urlnormalizer > Basic URL normalizer to normalize Unicod

[jira] [Resolved] (NUTCH-2747) Replace remaining o.a.commons.logging by org.slf4j

2019-11-07 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2747. Resolution: Fixed Merged in to master/1.x. Thanks, [~balaShashanka] ! > Replace remaining

[jira] [Commented] (NUTCH-2751) nutch clean does not work with secured solr cloud

2019-11-04 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966610#comment-16966610 ] Sebastian Nagel commented on NUTCH-2751: Thanks for the notice! The upgrade of th

[jira] [Updated] (NUTCH-2752) indexer-solr: Upgrade to latest Solr version

2019-11-04 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2752: --- Labels: easytask help-wanted (was: ) > indexer-solr: Upgrade to latest Solr version > --

[jira] [Created] (NUTCH-2752) indexer-solr: Upgrade to latest Solr version

2019-11-04 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2752: -- Summary: indexer-solr: Upgrade to latest Solr version Key: NUTCH-2752 URL: https://issues.apache.org/jira/browse/NUTCH-2752 Project: Nutch Issue Type: Im

[jira] [Commented] (NUTCH-2751) nutch clean does not work with secured solr cloud

2019-11-04 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966547#comment-16966547 ] Sebastian Nagel commented on NUTCH-2751: Hi [~dhammling], no glue what could caus

[jira] [Updated] (NUTCH-2751) nutch clean does not work with secured solr cloud

2019-11-04 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2751: --- Fix Version/s: 1.17 > nutch clean does not work with secured solr cloud > ---

[jira] [Commented] (NUTCH-2671) Upgrade ant ivy library

2019-10-31 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963954#comment-16963954 ] Sebastian Nagel commented on NUTCH-2671: [Ivy 2.5.0|https://ant.apache.org/ivy/hi

[jira] [Commented] (NUTCH-2733) protocol-okhttp: add support for Brotli compression (Content-Encoding)

2019-10-31 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963945#comment-16963945 ] Sebastian Nagel commented on NUTCH-2733: Steps: - upgrade to [latest okhttp 4.x|h

[jira] [Updated] (NUTCH-2733) protocol-okhttp: add support for Brotli compression (Content-Encoding)

2019-10-31 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2733: --- Labels: easytask help-wanted needs-test (was: ) > protocol-okhttp: add support for Brotli co

[jira] [Resolved] (NUTCH-2735) Update the indexer-solr documentation about the schema.xml usage

2019-10-31 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2735. Resolution: Fixed Thanks, [~roannel] ! > Update the indexer-solr documentation about the s

[jira] [Created] (NUTCH-2749) Fetcher and scoring-opic: transfer score to redirects

2019-10-18 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2749: -- Summary: Fetcher and scoring-opic: transfer score to redirects Key: NUTCH-2749 URL: https://issues.apache.org/jira/browse/NUTCH-2749 Project: Nutch Issue

[jira] [Updated] (NUTCH-2748) Fetch status gone (redirect exceeded) not to overwrite existing items in CrawlDb

2019-10-18 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2748: --- Description: If fetcher is following redirects and the max. number of redirects in a redirec

[jira] [Commented] (NUTCH-2735) Update the indexer-solr documentation about the schema.xml usage

2019-10-18 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954648#comment-16954648 ] Sebastian Nagel commented on NUTCH-2735: The1.x tutorial has been updated (also r

[jira] [Created] (NUTCH-2748) Fetch status gone (redirect exceeded) not to overwrite existing items in CrawlDb

2019-10-18 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2748: -- Summary: Fetch status gone (redirect exceeded) not to overwrite existing items in CrawlDb Key: NUTCH-2748 URL: https://issues.apache.org/jira/browse/NUTCH-2748 Pr

[jira] [Commented] (NUTCH-2747) Replace remaining o.a.commons.logging by org.slf4j

2019-10-18 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954343#comment-16954343 ] Sebastian Nagel commented on NUTCH-2747: Sure, of course, help is welcome - sorry

[jira] [Updated] (NUTCH-2747) Replace remaining o.a.commons.logging by org.slf4j

2019-10-18 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2747: --- Labels: easytask help-wanted (was: ) > Replace remaining o.a.commons.logging by org.slf4j >

[jira] [Created] (NUTCH-2747) Replace remaining o.a.commons.logging by org.slf4j

2019-10-17 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2747: -- Summary: Replace remaining o.a.commons.logging by org.slf4j Key: NUTCH-2747 URL: https://issues.apache.org/jira/browse/NUTCH-2747 Project: Nutch Issue Ty

[jira] [Commented] (NUTCH-1559) parse-metatags duplicates extracted metatags

2019-10-17 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953793#comment-16953793 ] Sebastian Nagel commented on NUTCH-1559: Proof (using [testMetatags.html|https:/

[jira] [Commented] (NUTCH-1559) parse-metatags duplicates extracted metatags

2019-10-17 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953786#comment-16953786 ] Sebastian Nagel commented on NUTCH-1559: After a closer the look: the issue is no

[jira] [Assigned] (NUTCH-1559) parse-metatags duplicates extracted metatags

2019-10-17 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-1559: -- Assignee: Sebastian Nagel > parse-metatags duplicates extracted metatags > ---

[jira] [Updated] (NUTCH-1559) parse-metatags duplicates extracted metatags

2019-10-17 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1559: --- Summary: parse-metatags duplicates extracted metatags (was: parse-metatags duplicates extrac

[jira] [Commented] (NUTCH-2746) Basic URL normalizer to normalize Unicode domain names

2019-10-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16951956#comment-16951956 ] Sebastian Nagel commented on NUTCH-2746: PR open. Of course, there are methods pr

[jira] [Created] (NUTCH-2746) Basic URL normalizer to normalize Unicode domain names

2019-10-15 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2746: -- Summary: Basic URL normalizer to normalize Unicode domain names Key: NUTCH-2746 URL: https://issues.apache.org/jira/browse/NUTCH-2746 Project: Nutch Issu

[jira] [Resolved] (NUTCH-2511) SitemapProcessor limited by http.content.limit

2019-10-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2511. Resolution: Fixed Thanks, [~yossi]! > SitemapProcessor limited by http.content.limit > ---

[jira] [Created] (NUTCH-2745) Solr schema.xml not shipped in binary release

2019-10-15 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2745: -- Summary: Solr schema.xml not shipped in binary release Key: NUTCH-2745 URL: https://issues.apache.org/jira/browse/NUTCH-2745 Project: Nutch Issue Type: B

[jira] [Updated] (NUTCH-2133) Transfer Selenium Documentation to Wiki

2019-10-14 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2133: --- Summary: Transfer Selenium Documentation to Wiki (was: Transfer Selenium Documentation to WI

[jira] [Updated] (NUTCH-2133) Transfer Selenium Documentation to WIki

2019-10-14 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2133: --- Fix Version/s: (was: 2.5) > Transfer Selenium Documentation to WIki > ---

[jira] [Reopened] (NUTCH-2133) Transfer Selenium Documentation to WIki

2019-10-14 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reopened NUTCH-2133: > Transfer Selenium Documentation to WIki > --- > >

[jira] [Updated] (NUTCH-2290) Update licenses of bundled libraries

2019-10-11 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2290: --- Fix Version/s: (was: 2.5) > Update licenses of bundled libraries > --

[jira] [Updated] (NUTCH-1086) Rewrite protocol-httpclient

2019-10-11 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1086: --- Fix Version/s: (was: 2.5) > Rewrite protocol-httpclient > --- > >

[jira] [Updated] (NUTCH-2671) Upgrade ant ivy library

2019-10-11 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2671: --- Fix Version/s: (was: 2.5) > Upgrade ant ivy library > --- > >

[jira] [Updated] (NUTCH-2669) Reliable solution for javax.ws packaging.type

2019-10-11 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2669: --- Fix Version/s: (was: 2.5) > Reliable solution for javax.ws packaging.type > -

[jira] [Created] (NUTCH-2744) CrawlDbReader: improved reporting of syntactic errors in Jexl expression

2019-10-11 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2744: -- Summary: CrawlDbReader: improved reporting of syntactic errors in Jexl expression Key: NUTCH-2744 URL: https://issues.apache.org/jira/browse/NUTCH-2744 Project: N

[jira] [Closed] (NUTCH-1522) Upgrade to Tika 1.3

2019-10-11 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel closed NUTCH-1522. -- > Upgrade to Tika 1.3 > --- > > Key: NUTCH-1522 > U

[jira] [Closed] (NUTCH-1126) JUnit test for urlfilter-prefix

2019-10-11 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel closed NUTCH-1126. -- > JUnit test for urlfilter-prefix > --- > > Key: NUTCH-

[jira] [Closed] (NUTCH-1578) Upgrade to Hadoop 1.2.0

2019-10-11 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel closed NUTCH-1578. -- > Upgrade to Hadoop 1.2.0 > --- > > Key: NUTCH-1578 >

[jira] [Closed] (NUTCH-1475) Index-More Plugin -- A better fall back value for date field

2019-10-11 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel closed NUTCH-1475. -- > Index-More Plugin -- A better fall back value for date field > --

[jira] [Closed] (NUTCH-1591) Incorrect conversion of ByteBuffer to String

2019-10-11 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel closed NUTCH-1591. -- > Incorrect conversion of ByteBuffer to String > > >

[jira] [Closed] (NUTCH-2360) HTTP Basic Authentication in SolrIndexerPlugin is gone

2019-10-11 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel closed NUTCH-2360. -- > HTTP Basic Authentication in SolrIndexerPlugin is gone >

[jira] [Commented] (NUTCH-2743) Add list of Nutch properties (nutch-default.xml) to documentation

2019-10-11 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949552#comment-16949552 ] Sebastian Nagel commented on NUTCH-2743: One benefit: this would make properties

[jira] [Created] (NUTCH-2743) Add list of Nutch properties (nutch-default.xml) to documentation

2019-10-11 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2743: -- Summary: Add list of Nutch properties (nutch-default.xml) to documentation Key: NUTCH-2743 URL: https://issues.apache.org/jira/browse/NUTCH-2743 Project: Nutch

[jira] [Resolved] (NUTCH-2279) LinkRank fails when using Hadoop MR output compression

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2279. Resolution: Fixed Thanks, [~naegelejd]! > LinkRank fails when using Hadoop MR output compr

[jira] [Commented] (NUTCH-2740) Generator: generate.max.count overflow not logged

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941888#comment-16941888 ] Sebastian Nagel commented on NUTCH-2740: Fixed in [4d68c08|https://github.com/ap

[jira] [Resolved] (NUTCH-2740) Generator: generate.max.count overflow not logged

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2740. Resolution: Fixed > Generator: generate.max.count overflow not logged > ---

[jira] [Resolved] (NUTCH-2738) Generator: document property generate.restrict.status

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2738. Resolution: Fixed > Generator: document property generate.restrict.status > ---

[jira] [Resolved] (NUTCH-2737) Generator: count and log reason of rejections during selection

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2737. Resolution: Implemented > Generator: count and log reason of rejections during selection >

[jira] [Updated] (NUTCH-2525) Metadata indexer cannot handle uppercase parse metadata

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2525: --- Fix Version/s: (was: 1.16) 1.17 > Metadata indexer cannot handle upper

[jira] [Updated] (NUTCH-2309) Scoring-Similarity Plugin raises NullPointerException when error occurs in fetching URL

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2309: --- Fix Version/s: (was: 1.16) 1.17 > Scoring-Similarity Plugin raises Nul

[jira] [Updated] (NUTCH-2353) Create seed file with metadata using the REST API

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2353: --- Fix Version/s: (was: 1.16) 1.17 > Create seed file with metadata using

[jira] [Updated] (NUTCH-2419) Domain blacklist URL filter does not respect command-line override for file

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2419: --- Fix Version/s: (was: 1.16) 1.17 > Domain blacklist URL filter does not

[jira] [Updated] (NUTCH-2506) host is not available for filtering on the JEXL indexing plugin

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2506: --- Fix Version/s: (was: 1.16) 1.17 > host is not available for filtering

[jira] [Updated] (NUTCH-2511) SitemapProcessor limited by http.content.limit

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2511: --- Fix Version/s: (was: 1.16) 1.17 > SitemapProcessor limited by http.con

[jira] [Updated] (NUTCH-2278) Handle alpha-2 language codes consistently

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2278: --- Fix Version/s: (was: 1.16) 1.17 > Handle alpha-2 language codes consis

[jira] [Updated] (NUTCH-1403) Add default ScoringFilter for manipulating metadata

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1403: --- Fix Version/s: (was: 1.16) 1.17 > Add default ScoringFilter for manipu

[jira] [Updated] (NUTCH-2248) CSS parser plugin

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2248: --- Fix Version/s: (was: 1.16) 1.17 > CSS parser plugin >

[jira] [Updated] (NUTCH-2735) Update the indexer-solr documentation about the schema.xml usage

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2735: --- Fix Version/s: (was: 1.16) > Update the indexer-solr documentation about the schema.xml u

[jira] [Commented] (NUTCH-2735) Update the indexer-solr documentation about the schema.xml usage

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941867#comment-16941867 ] Sebastian Nagel commented on NUTCH-2735: Ok, moving to 1.17. We need a clean list

[jira] [Updated] (NUTCH-1559) parse-metatags duplicates extracted metatags in combination with parse-tika

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1559: --- Fix Version/s: (was: 1.16) 1.17 > parse-metatags duplicates extracted

[jira] [Updated] (NUTCH-1749) Optionally exclude title from content field

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1749: --- Fix Version/s: (was: 1.16) 1.17 > Optionally exclude title from conten

[jira] [Updated] (NUTCH-1380) Fetcher reducer not to configure filter/normalizers

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1380: --- Fix Version/s: 1.17 > Fetcher reducer not to configure filter/normalizers > -

[jira] [Commented] (NUTCH-1380) Fetcher reducer not to configure filter/normalizers

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941863#comment-16941863 ] Sebastian Nagel commented on NUTCH-1380: This should be fixed by NUTCH-2375 which

[jira] [Resolved] (NUTCH-1342) Read time out protocol-http

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1342. Resolution: Not A Problem Can be fixed via configuration. Thanks, everybody! > Read time o

[jira] [Updated] (NUTCH-1186) FreeGenerator always normalizes

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1186: --- Fix Version/s: 1.17 > FreeGenerator always normalizes > --- > >

[jira] [Commented] (NUTCH-1186) FreeGenerator always normalizes

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941824#comment-16941824 ] Sebastian Nagel commented on NUTCH-1186: Disabling normalization can be done by s

[jira] [Updated] (NUTCH-1194) CrawlDB lock should be released earlier

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1194: --- Fix Version/s: 1.17 > CrawlDB lock should be released earlier > -

[jira] [Resolved] (NUTCH-1176) Fix all javadoc warnings from nightly builds

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1176. Resolution: Abandoned Outdated. > Fix all javadoc warnings from nightly builds > -

[jira] [Resolved] (NUTCH-1076) Solrindex has no documents following bin/nutch solrindex when using protocol-file

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1076. Resolution: Duplicate > Solrindex has no documents following bin/nutch solrindex when using

[jira] [Resolved] (NUTCH-1035) Tune Solr config for Nutch users

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1035. Resolution: Abandoned Definitely outdated, the Solr schema.xml has been reworked multiple t

[jira] [Resolved] (NUTCH-1805) Remove unnecessary transitive dependencies from Hadoop core

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1805. Resolution: Resolved We rely now only on a fixed set of Hadoop sub-dependencies ("hadoop-co

[jira] [Resolved] (NUTCH-1220) Upgrade Solr deps

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1220. Resolution: Resolved Obsoleted by multiple Solr upgrades, NUTCH-2600 is the latest one. >

[jira] [Updated] (NUTCH-1917) index.parse.md, index.content.md and index.db.md should support wildcard

2019-10-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1917: --- Fix Version/s: (was: 1.16) 1.17 > index.parse.md, index.content.md and

[jira] [Created] (NUTCH-2741) Remove ivy/ivy-2.2.0.jar

2019-10-01 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2741: -- Summary: Remove ivy/ivy-2.2.0.jar Key: NUTCH-2741 URL: https://issues.apache.org/jira/browse/NUTCH-2741 Project: Nutch Issue Type: Bug Componen

[jira] [Updated] (NUTCH-2279) LinkRank fails when using Hadoop MR output compression

2019-09-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2279: --- Labels: patch-available (was: ) > LinkRank fails when using Hadoop MR output compression > -

[jira] [Updated] (NUTCH-2279) LinkRank fails when using Hadoop MR output compression

2019-09-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2279: --- Component/s: webgraph > LinkRank fails when using Hadoop MR output compression >

[jira] [Resolved] (NUTCH-2387) Nutch should not index document with "noindex" meta

2019-09-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2387. Resolution: Cannot Reproduce Hi [~eyeris], I've verified that the linked HTML page gets del

[jira] [Updated] (NUTCH-2738) Generator: document property generate.restrict.status

2019-09-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2738: --- Fix Version/s: (was: 1.17) 1.16 > Generator: document property generat

[jira] [Updated] (NUTCH-2636) protocol-okhttp: http.proxy.exclusion.list does not work if http.proxy.username

2019-09-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2636: --- Fix Version/s: (was: 1.16) 1.17 > protocol-okhttp: http.proxy.exclusio

[jira] [Updated] (NUTCH-2737) Generator: count and log reason of rejections during selection

2019-09-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2737: --- Fix Version/s: (was: 1.17) 1.16 > Generator: count and log reason of r

[jira] [Created] (NUTCH-2740) Generator: generate.max.count overflow not logged

2019-09-30 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2740: -- Summary: Generator: generate.max.count overflow not logged Key: NUTCH-2740 URL: https://issues.apache.org/jira/browse/NUTCH-2740 Project: Nutch Issue Typ

[jira] [Updated] (NUTCH-2304) Fix Elasticsearch Rest Indexing Plugin's Dependencies

2019-09-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2304: --- Fix Version/s: 1.17 > Fix Elasticsearch Rest Indexing Plugin's Dependencies > ---

<    6   7   8   9   10   11   12   13   14   15   >