[jira] [Resolved] (NUTCH-3073) Address Java compiler warnings

2024-10-06 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-3073. Resolution: Fixed > Address Java compiler warni

[jira] [Created] (NUTCH-3073) Address Java compiler warnings

2024-10-04 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-3073: -- Summary: Address Java compiler warnings Key: NUTCH-3073 URL: https://issues.apache.org/jira/browse/NUTCH-3073 Project: Nutch Issue Type: Improvement

[jira] [Created] (NUTCH-3072) Fetcher to stop QueueFeeder if aborting with "hung threads"

2024-10-04 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-3072: -- Summary: Fetcher to stop QueueFeeder if aborting with "hung threads" Key: NUTCH-3072 URL: https://issues.apache.org/jira/browse/NUTCH-3072 Proj

[jira] [Assigned] (NUTCH-3068) Documentation on Nutch Homepage

2024-10-02 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-3068: -- Assignee: Sebastian Nagel > Documentation on Nutch Homep

[jira] [Resolved] (NUTCH-3070) Documentation has outdated links

2024-10-02 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-3070. Resolution: Fixed Thanks for reporting, [~hiranchaudhuri]! > Documentation has outda

[jira] [Assigned] (NUTCH-3070) Documentation has outdated links

2024-10-02 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-3070: -- Assignee: Sebastian Nagel > Documentation has outdated li

[jira] [Updated] (NUTCH-3070) Documentation has outdated links

2024-10-02 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-3070: --- Component/s: wiki > Documentation has outdated li

[jira] [Updated] (NUTCH-3069) Update protocol-smb reference

2024-10-02 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-3069: --- Component/s: wiki > Update protocol-smb refere

[jira] [Updated] (NUTCH-3071) Tutorial for Intranet Document Search outdated

2024-10-02 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-3071: --- Component/s: wiki > Tutorial for Intranet Document Search outda

[jira] [Updated] (NUTCH-3056) Injector to support resolving seed URLs

2024-10-02 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-3056: --- Component/s: injector > Injector to support resolving seed U

[jira] [Commented] (NUTCH-3071) Tutorial for Intranet Document Search outdated

2024-10-02 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886400#comment-17886400 ] Sebastian Nagel commented on NUTCH-3071: Hi [~hiranchaudhuri], thanks

[jira] [Commented] (NUTCH-2856) Implement a protocol-smb plugin based on hierynomus/smbj

2024-10-02 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886393#comment-17886393 ] Sebastian Nagel commented on NUTCH-2856: Hi [~hiranchaudhuri], yes and of co

[jira] [Resolved] (NUTCH-2812) Methods returning array may expose internal representation

2024-09-17 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2812. Resolution: Fixed > Methods returning array may expose internal representat

[jira] [Resolved] (NUTCH-1942) Remove TopLevelDomain

2024-09-17 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1942. Resolution: Done > Remove TopLevelDomain > -- > >

[jira] [Resolved] (NUTCH-1806) Delegate processing of URL domains to crawler commons

2024-09-17 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1806. Resolution: Implemented Thanks, everybody! > Delegate processing of URL domains

[jira] [Resolved] (NUTCH-3058) Fetcher: counter for hung threads

2024-09-16 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-3058. Resolution: Implemented > Fetcher: counter for hung thre

[jira] [Commented] (NUTCH-3059) Generator: selector job does not count reduce output records

2024-09-14 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881792#comment-17881792 ] Sebastian Nagel commented on NUTCH-3059: Note: the above test was run in ps

[jira] [Commented] (NUTCH-3059) Generator: selector job does not count reduce output records

2024-09-14 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881791#comment-17881791 ] Sebastian Nagel commented on NUTCH-3059: Ok, found the reason: it's b

[jira] [Resolved] (NUTCH-3061) URL filters to log name of the rule file rules are read from

2024-09-13 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-3061. Resolution: Implemented > URL filters to log name of the rule file rules are read f

[jira] [Resolved] (NUTCH-3062) protocol-okhttp: optionally record HTTP and SSL/TLS versions

2024-09-13 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-3062. Resolution: Implemented > protocol-okhttp: optionally record HTTP and SSL/TLS versi

[jira] [Resolved] (NUTCH-3065) Format changelog as Markdown

2024-09-13 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-3065. Resolution: Implemented > Format changelog as Markd

[jira] [Resolved] (NUTCH-3066) Protocol plugin unit tests fail randomly

2024-09-13 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-3066. Resolution: Fixed > Protocol plugin unit tests fail rando

[jira] [Commented] (NUTCH-1806) Delegate processing of URL domains to crawler commons

2024-09-11 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880958#comment-17880958 ] Sebastian Nagel commented on NUTCH-1806: > it seems odd to return a

[jira] [Created] (NUTCH-3067) Improve performance of FetchItemQueues if error state is preserved

2024-09-07 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-3067: -- Summary: Improve performance of FetchItemQueues if error state is preserved Key: NUTCH-3067 URL: https://issues.apache.org/jira/browse/NUTCH-3067 Project: Nutch

[jira] [Commented] (NUTCH-1806) Delegate processing of URL domains to crawler commons

2024-09-07 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880036#comment-17880036 ] Sebastian Nagel commented on NUTCH-1806: Any comments on this? It's an

[jira] [Resolved] (NUTCH-3063) Support for "addBinaryContent" from REST API

2024-09-06 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-3063. Resolution: Implemented Committed in [ac03cf1|https://github.com/apache/nutch/commit

[jira] [Commented] (NUTCH-3063) Support for "addBinaryContent" from REST API

2024-09-06 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879964#comment-17879964 ] Sebastian Nagel commented on NUTCH-3063: +1 looks good. And definitely m

[jira] [Commented] (NUTCH-3065) Format changelog as Markdown

2024-09-05 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879666#comment-17879666 ] Sebastian Nagel commented on NUTCH-3065: PR in progress: the [reforma

[jira] [Assigned] (NUTCH-3065) Format changelog as Markdown

2024-09-05 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-3065: -- Assignee: Sebastian Nagel > Format changelog as Markd

[jira] [Created] (NUTCH-3065) Format changelog as Markdown

2024-09-05 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-3065: -- Summary: Format changelog as Markdown Key: NUTCH-3065 URL: https://issues.apache.org/jira/browse/NUTCH-3065 Project: Nutch Issue Type: Improvement

[jira] [Updated] (NUTCH-3060) Javadoc link broken on website

2024-08-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-3060: --- Description: The link to the 1.20 Javadocs on [https://nutch.apache.org/documentation

[jira] [Commented] (NUTCH-3060) Javadoc link broken on website

2024-08-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870291#comment-17870291 ] Sebastian Nagel commented on NUTCH-3060: The missing Javadocs are now place

[jira] [Created] (NUTCH-3062) protocol-okhttp: optionally record HTTP and SSL/TLS versions

2024-07-09 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-3062: -- Summary: protocol-okhttp: optionally record HTTP and SSL/TLS versions Key: NUTCH-3062 URL: https://issues.apache.org/jira/browse/NUTCH-3062 Project: Nutch

[jira] [Created] (NUTCH-3061) URL filters to log name of the rule file rules are read from

2024-07-09 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-3061: -- Summary: URL filters to log name of the rule file rules are read from Key: NUTCH-3061 URL: https://issues.apache.org/jira/browse/NUTCH-3061 Project: Nutch

[jira] [Created] (NUTCH-3060) Javadoc link broken on website

2024-06-28 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-3060: -- Summary: Javadoc link broken on website Key: NUTCH-3060 URL: https://issues.apache.org/jira/browse/NUTCH-3060 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-3060) Javadoc link broken on website

2024-06-28 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-3060: --- Fix Version/s: 1.21 (was: 1.20) > Javadoc link broken on webs

[jira] [Created] (NUTCH-3059) Generator: selector job does not count reduce output records

2024-06-05 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-3059: -- Summary: Generator: selector job does not count reduce output records Key: NUTCH-3059 URL: https://issues.apache.org/jira/browse/NUTCH-3059 Project: Nutch

[jira] [Created] (NUTCH-3058) Fetcher: counter for hung threads

2024-06-05 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-3058: -- Summary: Fetcher: counter for hung threads Key: NUTCH-3058 URL: https://issues.apache.org/jira/browse/NUTCH-3058 Project: Nutch Issue Type: Improvement

[jira] [Resolved] (NUTCH-3055) README: fix Github "hub" commands

2024-05-28 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-3055. Resolution: Fixed > README: fix Github "hub&q

[jira] [Resolved] (NUTCH-3044) Generator: NPE when extracting the host part of a URL fails

2024-05-28 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-3044. Resolution: Fixed > Generator: NPE when extracting the host part of a URL fa

[jira] [Resolved] (NUTCH-3043) Generator: count URLs rejected by URL filters

2024-05-14 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-3043. Resolution: Implemented > Generator: count URLs rejected by URL filt

[jira] [Resolved] (NUTCH-3039) Failure to handle ftp:// URLs

2024-05-14 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-3039. Resolution: Fixed > Failure to handle ftp:// U

[jira] [Created] (NUTCH-3055) README: fix Github "hub" commands

2024-04-30 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-3055: -- Summary: README: fix Github "hub" commands Key: NUTCH-3055 URL: https://issues.apache.org/jira/browse/NUTCH-3055 Project: Nutch Issue

[jira] [Commented] (NUTCH-3028) WARCExported to support filtering by JEXL

2024-04-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842291#comment-17842291 ] Sebastian Nagel commented on NUTCH-3028: +1 lgtm. One question: if there i

[jira] [Commented] (NUTCH-3045) Upgrade from Java 11 to 17

2024-04-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842284#comment-17842284 ] Sebastian Nagel commented on NUTCH-3045: See also NUTCH-2987. Until HADOOP-1

Re: [DISCUSS] Consolidating Nutch Continuous Integration

2024-04-28 Thread Sebastian Nagel
Hi Lewis, > The Jenkins job used to be run nightly but > no longer is. It pulls nightly from git: https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/scmPollLog/ but a build is only run if there are new commits. The latest one: https://lists.apache.org/thread/ywtlmdmckhd21c6y9c77z01q17h42

[jira] [Created] (NUTCH-3044) Generator: NPE when extracting the host part of a URL fails

2024-04-25 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-3044: -- Summary: Generator: NPE when extracting the host part of a URL fails Key: NUTCH-3044 URL: https://issues.apache.org/jira/browse/NUTCH-3044 Project: Nutch

[jira] [Created] (NUTCH-3043) Generator: count URLs rejected by URL filters

2024-04-25 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-3043: -- Summary: Generator: count URLs rejected by URL filters Key: NUTCH-3043 URL: https://issues.apache.org/jira/browse/NUTCH-3043 Project: Nutch Issue Type

[jira] [Created] (NUTCH-3040) Upgrade to Hadoop 3.4.0

2024-04-11 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-3040: -- Summary: Upgrade to Hadoop 3.4.0 Key: NUTCH-3040 URL: https://issues.apache.org/jira/browse/NUTCH-3040 Project: Nutch Issue Type: Improvement

Re: [VOTE] Apache Nutch 1.20 Release

2024-04-11 Thread Sebastian Nagel
, see https://github.com/sebastian-nagel/nutch-test-single-node-cluster/ One note about the CHANGES.md: it's now a mixture of HTML and plain text. It does not use the potential of markdown, e.g. sections / headlines for the releases to make the change log navigable via a table of contents. Th

[jira] [Assigned] (NUTCH-3039) Failure to handle ftp:// URLs

2024-04-11 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-3039: -- Assignee: Sebastian Nagel > Failure to handle ftp:// U

[jira] [Created] (NUTCH-3039) Failure to handle ftp:// URLs

2024-04-11 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-3039: -- Summary: Failure to handle ftp:// URLs Key: NUTCH-3039 URL: https://issues.apache.org/jira/browse/NUTCH-3039 Project: Nutch Issue Type: Bug

[jira] [Resolved] (NUTCH-2937) parse-tika: review dependency exclusions and avoid dependency conflicts in distributed mode

2024-04-06 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2937. Resolution: Fixed Fixed NUTCH-2959 by using the shaded Tika package. Thanks, [~tallison

[jira] [Assigned] (NUTCH-2937) parse-tika: review dependency exclusions and avoid dependency conflicts in distributed mode

2024-04-06 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2937: -- Assignee: Tim Allison > parse-tika: review dependency exclusions and avoid depende

[jira] [Updated] (NUTCH-2937) parse-tika: review dependency exclusions and avoid dependency conflicts in distributed mode

2024-04-06 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2937: --- Fix Version/s: 1.20 (was: 1.21) > parse-tika: review depende

[jira] [Resolved] (NUTCH-3005) Upgrade selenium as needed

2024-04-06 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-3005. Resolution: Implemented Done by [~lewismc] as part of NUTCH-3036, commit [1563396|https

[jira] [Resolved] (NUTCH-3016) Upgrade Apache Ivy to 2.5.2

2024-04-06 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-3016. Resolution: Duplicate > Upgrade Apache Ivy to 2.

[jira] [Updated] (NUTCH-3016) Upgrade Apache Ivy to 2.5.2

2024-04-06 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-3016: --- Fix Version/s: 1.20 (was: 1.21) > Upgrade Apache Ivy to 2.

[jira] [Updated] (NUTCH-3005) Upgrade selenium as needed

2024-04-06 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-3005: --- Affects Version/s: 1.19 > Upgrade selenium as nee

[jira] [Updated] (NUTCH-3005) Upgrade selenium as needed

2024-04-06 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-3005: --- Fix Version/s: 1.20 > Upgrade selenium as nee

[jira] [Updated] (NUTCH-3028) WARCExported to support filtering by JEXL

2024-04-06 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-3028: --- Affects Version/s: 1.19 > WARCExported to support filtering by J

[jira] [Updated] (NUTCH-3028) WARCExported to support filtering by JEXL

2024-04-06 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-3028: --- Fix Version/s: 1.21 > WARCExported to support filtering by J

[jira] [Resolved] (NUTCH-2960) indexer-elastic: remove plugin from binary package to address licensing issues

2024-03-14 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2960. Resolution: Won't Fix The license issue is addressed by NUTCH-3008. > indexer

[jira] [Closed] (NUTCH-2960) indexer-elastic: remove plugin from binary package to address licensing issues

2024-03-14 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel closed NUTCH-2960. -- > indexer-elastic: remove plugin from binary package to address licensing iss

[jira] [Updated] (NUTCH-2960) indexer-elastic: remove plugin from binary package to address licensing issues

2024-03-14 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2960: --- Fix Version/s: (was: 1.20) > indexer-elastic: remove plugin from binary package

[jira] [Resolved] (NUTCH-3008) indexer-elastic: downgrade to ES 7.10.2 to address licensing issues

2024-03-14 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-3008. Resolution: Fixed > indexer-elastic: downgrade to ES 7.10.2 to address licensing iss

[jira] [Resolved] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-14 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-3029. Resolution: Implemented > Host specific max. and min. intervals in adaptive schedu

[jira] [Closed] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-14 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel closed NUTCH-3029. -- > Host specific max. and min. intervals in adaptive schedu

[jira] [Reopened] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-14 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reopened NUTCH-3029: Assignee: Sebastian Nagel (was: Markus Jelsma) Reopen to update "Fix version(s)&q

[jira] [Updated] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-14 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-3029: --- Fix Version/s: 1.20 > Host specific max. and min. intervals in adaptive schedu

[jira] [Created] (NUTCH-3035) Update license and notice file for release of 1.20

2024-03-13 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-3035: -- Summary: Update license and notice file for release of 1.20 Key: NUTCH-3035 URL: https://issues.apache.org/jira/browse/NUTCH-3035 Project: Nutch Issue

Re: [DISCUSS] Release Nutch 1.20

2024-03-09 Thread Sebastian Nagel
Hi Lewis, yes, of course! Some points we should do before the release: - address the ES licensing issue, the easiest way is to downgrade, see NUTCH-3008 If done update the license-related files. - there are three short PRs open I'll try to have a look at these points the next days. Best,

[jira] [Resolved] (NUTCH-3025) urlfilter-fast to filter based on the length of the URL

2023-11-08 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-3025. Resolution: Implemented > urlfilter-fast to filter based on the length of the

[jira] [Updated] (NUTCH-3025) urlfilter-fast to filter based on the length of the URL

2023-11-08 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-3025: --- Component/s: plugin urlfilter > urlfilter-fast to filter based on

[jira] [Commented] (NUTCH-3017) Allow fast-urlfilter to load from HDFS/S3 and support gzipped input

2023-11-08 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784030#comment-17784030 ] Sebastian Nagel commented on NUTCH-3017: Thanks, [~jnioche] > All

[jira] [Resolved] (NUTCH-3017) Allow fast-urlfilter to load from HDFS/S3 and support gzipped input

2023-11-08 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-3017. Resolution: Implemented > Allow fast-urlfilter to load from HDFS/S3 and support gzip

[jira] [Updated] (NUTCH-3017) Allow fast-urlfilter to load from HDFS/S3 and support gzipped input

2023-10-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-3017: --- Component/s: plugin urlfilter > Allow fast-urlfilter to load from HDFS

[jira] [Updated] (NUTCH-3017) Allow fast-urlfilter to load from HDFS/S3 and support gzipped input

2023-10-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-3017: --- Fix Version/s: 1.20 > Allow fast-urlfilter to load from HDFS/S3 and support gzipped in

Re: Nutch codebase formatting

2023-10-29 Thread Sebastian Nagel
Hi Lewis, >> whether we need a Nutch custom code style at all… why don’t we just use >> some other existing style and then enforce it? Enforcing: yes! However, I would try hard to keep the changes on a reasonable minimum. For example, if we change the indentation, almost every code line is aff

[jira] [Resolved] (NUTCH-3012) SegmentReader when dumping with option -recode: NPE on unparsed documents

2023-10-21 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-3012. Resolution: Fixed > SegmentReader when dumping with option -recode: NPE on unpar

[jira] [Resolved] (NUTCH-3011) HttpRobotRulesParser: handle HTTP 429 Too Many Requests same as server errors (HTTP 5xx)

2023-10-21 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-3011. Resolution: Implemented > HttpRobotRulesParser: handle HTTP 429 Too Many Requests same

[jira] [Resolved] (NUTCH-2990) HttpRobotRulesParser to follow 5 redirects as specified by RFC 9309

2023-10-21 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2990. Resolution: Implemented Thanks, everybody! > HttpRobotRulesParser to follow 5 redire

[jira] [Assigned] (NUTCH-3009) Upgrade to Hadoop 3.3.6

2023-10-21 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-3009: -- Assignee: Sebastian Nagel > Upgrade to Hadoop 3.

[jira] [Resolved] (NUTCH-3009) Upgrade to Hadoop 3.3.6

2023-10-21 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-3009. Resolution: Implemented > Upgrade to Hadoop 3.

[jira] [Resolved] (NUTCH-3006) Downgrade Tika dependency to 2.2.1 (core and parse-tika)

2023-10-21 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-3006. Fix Version/s: (was: 1.20) Resolution: Abandoned > Downgrade Tika dependency

[jira] [Assigned] (NUTCH-3002) Protocol-okhttp HttpResponse: HTTP header metadata lookup should be case-insensitive

2023-10-21 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-3002: -- Assignee: Sebastian Nagel > Protocol-okhttp HttpResponse: HTTP header metadata loo

[jira] [Resolved] (NUTCH-3002) Protocol-okhttp HttpResponse: HTTP header metadata lookup should be case-insensitive

2023-10-21 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-3002. Resolution: Fixed > Protocol-okhttp HttpResponse: HTTP header metadata lookup should

[jira] [Commented] (NUTCH-3014) Standardize NutchJob job names

2023-10-21 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778103#comment-17778103 ] Sebastian Nagel commented on NUTCH-3014: If there is a single data

[jira] [Updated] (NUTCH-3012) SegmentReader when dumping with option -recode: NPE on unparsed documents

2023-10-09 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-3012: --- Description: SegmentReader when called with the flag {{-recode}} fails with a NPE when

[jira] [Updated] (NUTCH-3012) SegmentReader when dumping with option -recode: NPE on unparsed documents

2023-10-09 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-3012: --- Summary: SegmentReader when dumping with option -recode: NPE on unparsed documents (was

[jira] [Created] (NUTCH-3012) SegmentReader when dumping with option -recode: NPE on documents without charset defined

2023-10-08 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-3012: -- Summary: SegmentReader when dumping with option -recode: NPE on documents without charset defined Key: NUTCH-3012 URL: https://issues.apache.org/jira/browse/NUTCH-3012

[jira] [Commented] (NUTCH-2959) Upgrade to Apache Tika 2.9.0

2023-10-03 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771445#comment-17771445 ] Sebastian Nagel commented on NUTCH-2959: Hi [~tallison], it's your

[jira] [Resolved] (NUTCH-1130) JUnit test for Any23 RDF plugin

2023-10-03 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1130. Resolution: Won't Do Closing - the any23 project has retired and the any23 plugi

[jira] [Closed] (NUTCH-1130) JUnit test for Any23 RDF plugin

2023-10-03 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel closed NUTCH-1130. -- > JUnit test for Any23 RDF plugin > --- > >

[jira] [Resolved] (NUTCH-2938) Use Any23's RepositoryWriter to write structured data to Rdf4j repository

2023-10-03 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2938. Resolution: Won't Do Closing - the any23 project has retired and the any23 plugi

[jira] [Closed] (NUTCH-2938) Use Any23's RepositoryWriter to write structured data to Rdf4j repository

2023-10-03 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel closed NUTCH-2938. -- > Use Any23's RepositoryWriter to write structured data to Rdf4j re

[jira] [Updated] (NUTCH-2938) Use Any23's RepositoryWriter to write structured data to Rdf4j repository

2023-10-03 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2938: --- Fix Version/s: (was: 1.20) > Use Any23's RepositoryWriter to write structured

[jira] [Resolved] (NUTCH-2853) bin/nutch: remove deprecated commands solrindex, solrdedup, solrclean

2023-10-03 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2853. Resolution: Fixed > bin/nutch: remove deprecated commands solrindex, solrdedup, solrcl

[jira] [Resolved] (NUTCH-2897) Do not supress deprecated API warnings

2023-10-03 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2897. Resolution: Fixed > Do not supress deprecated API warni

[jira] [Resolved] (NUTCH-3010) Injector: count unique number of injected URLs

2023-10-02 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-3010. Resolution: Fixed > Injector: count unique number of injected U

  1   2   3   4   5   6   7   8   9   10   >