[
https://issues.apache.org/jira/browse/NUTCH-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-3073.
Resolution: Fixed
> Address Java compiler warni
Sebastian Nagel created NUTCH-3073:
--
Summary: Address Java compiler warnings
Key: NUTCH-3073
URL: https://issues.apache.org/jira/browse/NUTCH-3073
Project: Nutch
Issue Type: Improvement
Sebastian Nagel created NUTCH-3072:
--
Summary: Fetcher to stop QueueFeeder if aborting with "hung
threads"
Key: NUTCH-3072
URL: https://issues.apache.org/jira/browse/NUTCH-3072
Proj
[
https://issues.apache.org/jira/browse/NUTCH-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel reassigned NUTCH-3068:
--
Assignee: Sebastian Nagel
> Documentation on Nutch Homep
[
https://issues.apache.org/jira/browse/NUTCH-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-3070.
Resolution: Fixed
Thanks for reporting, [~hiranchaudhuri]!
> Documentation has outda
[
https://issues.apache.org/jira/browse/NUTCH-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel reassigned NUTCH-3070:
--
Assignee: Sebastian Nagel
> Documentation has outdated li
[
https://issues.apache.org/jira/browse/NUTCH-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-3070:
---
Component/s: wiki
> Documentation has outdated li
[
https://issues.apache.org/jira/browse/NUTCH-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-3069:
---
Component/s: wiki
> Update protocol-smb refere
[
https://issues.apache.org/jira/browse/NUTCH-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-3071:
---
Component/s: wiki
> Tutorial for Intranet Document Search outda
[
https://issues.apache.org/jira/browse/NUTCH-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-3056:
---
Component/s: injector
> Injector to support resolving seed U
[
https://issues.apache.org/jira/browse/NUTCH-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886400#comment-17886400
]
Sebastian Nagel commented on NUTCH-3071:
Hi [~hiranchaudhuri], thanks
[
https://issues.apache.org/jira/browse/NUTCH-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886393#comment-17886393
]
Sebastian Nagel commented on NUTCH-2856:
Hi [~hiranchaudhuri], yes and of co
[
https://issues.apache.org/jira/browse/NUTCH-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-2812.
Resolution: Fixed
> Methods returning array may expose internal representat
[
https://issues.apache.org/jira/browse/NUTCH-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-1942.
Resolution: Done
> Remove TopLevelDomain
> --
>
>
[
https://issues.apache.org/jira/browse/NUTCH-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-1806.
Resolution: Implemented
Thanks, everybody!
> Delegate processing of URL domains
[
https://issues.apache.org/jira/browse/NUTCH-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-3058.
Resolution: Implemented
> Fetcher: counter for hung thre
[
https://issues.apache.org/jira/browse/NUTCH-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881792#comment-17881792
]
Sebastian Nagel commented on NUTCH-3059:
Note: the above test was run in ps
[
https://issues.apache.org/jira/browse/NUTCH-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881791#comment-17881791
]
Sebastian Nagel commented on NUTCH-3059:
Ok, found the reason: it's b
[
https://issues.apache.org/jira/browse/NUTCH-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-3061.
Resolution: Implemented
> URL filters to log name of the rule file rules are read f
[
https://issues.apache.org/jira/browse/NUTCH-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-3062.
Resolution: Implemented
> protocol-okhttp: optionally record HTTP and SSL/TLS versi
[
https://issues.apache.org/jira/browse/NUTCH-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-3065.
Resolution: Implemented
> Format changelog as Markd
[
https://issues.apache.org/jira/browse/NUTCH-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-3066.
Resolution: Fixed
> Protocol plugin unit tests fail rando
[
https://issues.apache.org/jira/browse/NUTCH-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880958#comment-17880958
]
Sebastian Nagel commented on NUTCH-1806:
> it seems odd to return a
Sebastian Nagel created NUTCH-3067:
--
Summary: Improve performance of FetchItemQueues if error state is
preserved
Key: NUTCH-3067
URL: https://issues.apache.org/jira/browse/NUTCH-3067
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880036#comment-17880036
]
Sebastian Nagel commented on NUTCH-1806:
Any comments on this? It's an
[
https://issues.apache.org/jira/browse/NUTCH-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-3063.
Resolution: Implemented
Committed in
[ac03cf1|https://github.com/apache/nutch/commit
[
https://issues.apache.org/jira/browse/NUTCH-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879964#comment-17879964
]
Sebastian Nagel commented on NUTCH-3063:
+1 looks good. And definitely m
[
https://issues.apache.org/jira/browse/NUTCH-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879666#comment-17879666
]
Sebastian Nagel commented on NUTCH-3065:
PR in progress: the [reforma
[
https://issues.apache.org/jira/browse/NUTCH-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel reassigned NUTCH-3065:
--
Assignee: Sebastian Nagel
> Format changelog as Markd
Sebastian Nagel created NUTCH-3065:
--
Summary: Format changelog as Markdown
Key: NUTCH-3065
URL: https://issues.apache.org/jira/browse/NUTCH-3065
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-3060:
---
Description: The link to the 1.20 Javadocs on
[https://nutch.apache.org/documentation
[
https://issues.apache.org/jira/browse/NUTCH-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870291#comment-17870291
]
Sebastian Nagel commented on NUTCH-3060:
The missing Javadocs are now place
Sebastian Nagel created NUTCH-3062:
--
Summary: protocol-okhttp: optionally record HTTP and SSL/TLS
versions
Key: NUTCH-3062
URL: https://issues.apache.org/jira/browse/NUTCH-3062
Project: Nutch
Sebastian Nagel created NUTCH-3061:
--
Summary: URL filters to log name of the rule file rules are read
from
Key: NUTCH-3061
URL: https://issues.apache.org/jira/browse/NUTCH-3061
Project: Nutch
Sebastian Nagel created NUTCH-3060:
--
Summary: Javadoc link broken on website
Key: NUTCH-3060
URL: https://issues.apache.org/jira/browse/NUTCH-3060
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-3060:
---
Fix Version/s: 1.21
(was: 1.20)
> Javadoc link broken on webs
Sebastian Nagel created NUTCH-3059:
--
Summary: Generator: selector job does not count reduce output
records
Key: NUTCH-3059
URL: https://issues.apache.org/jira/browse/NUTCH-3059
Project: Nutch
Sebastian Nagel created NUTCH-3058:
--
Summary: Fetcher: counter for hung threads
Key: NUTCH-3058
URL: https://issues.apache.org/jira/browse/NUTCH-3058
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-3055.
Resolution: Fixed
> README: fix Github "hub&q
[
https://issues.apache.org/jira/browse/NUTCH-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-3044.
Resolution: Fixed
> Generator: NPE when extracting the host part of a URL fa
[
https://issues.apache.org/jira/browse/NUTCH-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-3043.
Resolution: Implemented
> Generator: count URLs rejected by URL filt
[
https://issues.apache.org/jira/browse/NUTCH-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-3039.
Resolution: Fixed
> Failure to handle ftp:// U
Sebastian Nagel created NUTCH-3055:
--
Summary: README: fix Github "hub" commands
Key: NUTCH-3055
URL: https://issues.apache.org/jira/browse/NUTCH-3055
Project: Nutch
Issue
[
https://issues.apache.org/jira/browse/NUTCH-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842291#comment-17842291
]
Sebastian Nagel commented on NUTCH-3028:
+1 lgtm.
One question: if there i
[
https://issues.apache.org/jira/browse/NUTCH-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842284#comment-17842284
]
Sebastian Nagel commented on NUTCH-3045:
See also NUTCH-2987. Until HADOOP-1
Hi Lewis,
> The Jenkins job used to be run nightly but
> no longer is.
It pulls nightly from git:
https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/scmPollLog/
but a build is only run if there are new commits. The latest one:
https://lists.apache.org/thread/ywtlmdmckhd21c6y9c77z01q17h42
Sebastian Nagel created NUTCH-3044:
--
Summary: Generator: NPE when extracting the host part of a URL
fails
Key: NUTCH-3044
URL: https://issues.apache.org/jira/browse/NUTCH-3044
Project: Nutch
Sebastian Nagel created NUTCH-3043:
--
Summary: Generator: count URLs rejected by URL filters
Key: NUTCH-3043
URL: https://issues.apache.org/jira/browse/NUTCH-3043
Project: Nutch
Issue Type
Sebastian Nagel created NUTCH-3040:
--
Summary: Upgrade to Hadoop 3.4.0
Key: NUTCH-3040
URL: https://issues.apache.org/jira/browse/NUTCH-3040
Project: Nutch
Issue Type: Improvement
, see
https://github.com/sebastian-nagel/nutch-test-single-node-cluster/
One note about the CHANGES.md: it's now a mixture of HTML and plain text.
It does not use the potential of markdown, e.g. sections / headlines for
the releases to make the change log navigable via a table of contents.
Th
[
https://issues.apache.org/jira/browse/NUTCH-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel reassigned NUTCH-3039:
--
Assignee: Sebastian Nagel
> Failure to handle ftp:// U
Sebastian Nagel created NUTCH-3039:
--
Summary: Failure to handle ftp:// URLs
Key: NUTCH-3039
URL: https://issues.apache.org/jira/browse/NUTCH-3039
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-2937.
Resolution: Fixed
Fixed NUTCH-2959 by using the shaded Tika package. Thanks, [~tallison
[
https://issues.apache.org/jira/browse/NUTCH-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel reassigned NUTCH-2937:
--
Assignee: Tim Allison
> parse-tika: review dependency exclusions and avoid depende
[
https://issues.apache.org/jira/browse/NUTCH-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-2937:
---
Fix Version/s: 1.20
(was: 1.21)
> parse-tika: review depende
[
https://issues.apache.org/jira/browse/NUTCH-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-3005.
Resolution: Implemented
Done by [~lewismc] as part of NUTCH-3036, commit
[1563396|https
[
https://issues.apache.org/jira/browse/NUTCH-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-3016.
Resolution: Duplicate
> Upgrade Apache Ivy to 2.
[
https://issues.apache.org/jira/browse/NUTCH-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-3016:
---
Fix Version/s: 1.20
(was: 1.21)
> Upgrade Apache Ivy to 2.
[
https://issues.apache.org/jira/browse/NUTCH-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-3005:
---
Affects Version/s: 1.19
> Upgrade selenium as nee
[
https://issues.apache.org/jira/browse/NUTCH-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-3005:
---
Fix Version/s: 1.20
> Upgrade selenium as nee
[
https://issues.apache.org/jira/browse/NUTCH-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-3028:
---
Affects Version/s: 1.19
> WARCExported to support filtering by J
[
https://issues.apache.org/jira/browse/NUTCH-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-3028:
---
Fix Version/s: 1.21
> WARCExported to support filtering by J
[
https://issues.apache.org/jira/browse/NUTCH-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-2960.
Resolution: Won't Fix
The license issue is addressed by NUTCH-3008.
> indexer
[
https://issues.apache.org/jira/browse/NUTCH-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel closed NUTCH-2960.
--
> indexer-elastic: remove plugin from binary package to address licensing iss
[
https://issues.apache.org/jira/browse/NUTCH-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-2960:
---
Fix Version/s: (was: 1.20)
> indexer-elastic: remove plugin from binary package
[
https://issues.apache.org/jira/browse/NUTCH-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-3008.
Resolution: Fixed
> indexer-elastic: downgrade to ES 7.10.2 to address licensing iss
[
https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-3029.
Resolution: Implemented
> Host specific max. and min. intervals in adaptive schedu
[
https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel closed NUTCH-3029.
--
> Host specific max. and min. intervals in adaptive schedu
[
https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel reopened NUTCH-3029:
Assignee: Sebastian Nagel (was: Markus Jelsma)
Reopen to update "Fix version(s)&q
[
https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-3029:
---
Fix Version/s: 1.20
> Host specific max. and min. intervals in adaptive schedu
Sebastian Nagel created NUTCH-3035:
--
Summary: Update license and notice file for release of 1.20
Key: NUTCH-3035
URL: https://issues.apache.org/jira/browse/NUTCH-3035
Project: Nutch
Issue
Hi Lewis,
yes, of course!
Some points we should do before the release:
- address the ES licensing issue,
the easiest way is to downgrade, see NUTCH-3008
If done update the license-related files.
- there are three short PRs open
I'll try to have a look at these points the next days.
Best,
[
https://issues.apache.org/jira/browse/NUTCH-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-3025.
Resolution: Implemented
> urlfilter-fast to filter based on the length of the
[
https://issues.apache.org/jira/browse/NUTCH-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-3025:
---
Component/s: plugin
urlfilter
> urlfilter-fast to filter based on
[
https://issues.apache.org/jira/browse/NUTCH-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784030#comment-17784030
]
Sebastian Nagel commented on NUTCH-3017:
Thanks, [~jnioche]
> All
[
https://issues.apache.org/jira/browse/NUTCH-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-3017.
Resolution: Implemented
> Allow fast-urlfilter to load from HDFS/S3 and support gzip
[
https://issues.apache.org/jira/browse/NUTCH-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-3017:
---
Component/s: plugin
urlfilter
> Allow fast-urlfilter to load from HDFS
[
https://issues.apache.org/jira/browse/NUTCH-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-3017:
---
Fix Version/s: 1.20
> Allow fast-urlfilter to load from HDFS/S3 and support gzipped in
Hi Lewis,
>> whether we need a Nutch custom code style at all… why don’t we just use
>> some other existing style and then enforce it?
Enforcing: yes!
However, I would try hard to keep the changes on a reasonable minimum. For
example, if we change the indentation, almost every code line is aff
[
https://issues.apache.org/jira/browse/NUTCH-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-3012.
Resolution: Fixed
> SegmentReader when dumping with option -recode: NPE on unpar
[
https://issues.apache.org/jira/browse/NUTCH-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-3011.
Resolution: Implemented
> HttpRobotRulesParser: handle HTTP 429 Too Many Requests same
[
https://issues.apache.org/jira/browse/NUTCH-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-2990.
Resolution: Implemented
Thanks, everybody!
> HttpRobotRulesParser to follow 5 redire
[
https://issues.apache.org/jira/browse/NUTCH-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel reassigned NUTCH-3009:
--
Assignee: Sebastian Nagel
> Upgrade to Hadoop 3.
[
https://issues.apache.org/jira/browse/NUTCH-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-3009.
Resolution: Implemented
> Upgrade to Hadoop 3.
[
https://issues.apache.org/jira/browse/NUTCH-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-3006.
Fix Version/s: (was: 1.20)
Resolution: Abandoned
> Downgrade Tika dependency
[
https://issues.apache.org/jira/browse/NUTCH-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel reassigned NUTCH-3002:
--
Assignee: Sebastian Nagel
> Protocol-okhttp HttpResponse: HTTP header metadata loo
[
https://issues.apache.org/jira/browse/NUTCH-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-3002.
Resolution: Fixed
> Protocol-okhttp HttpResponse: HTTP header metadata lookup should
[
https://issues.apache.org/jira/browse/NUTCH-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778103#comment-17778103
]
Sebastian Nagel commented on NUTCH-3014:
If there is a single data
[
https://issues.apache.org/jira/browse/NUTCH-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-3012:
---
Description:
SegmentReader when called with the flag {{-recode}} fails with a NPE when
[
https://issues.apache.org/jira/browse/NUTCH-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-3012:
---
Summary: SegmentReader when dumping with option -recode: NPE on unparsed
documents (was
Sebastian Nagel created NUTCH-3012:
--
Summary: SegmentReader when dumping with option -recode: NPE on
documents without charset defined
Key: NUTCH-3012
URL: https://issues.apache.org/jira/browse/NUTCH-3012
[
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771445#comment-17771445
]
Sebastian Nagel commented on NUTCH-2959:
Hi [~tallison], it's your
[
https://issues.apache.org/jira/browse/NUTCH-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-1130.
Resolution: Won't Do
Closing - the any23 project has retired and the any23 plugi
[
https://issues.apache.org/jira/browse/NUTCH-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel closed NUTCH-1130.
--
> JUnit test for Any23 RDF plugin
> ---
>
>
[
https://issues.apache.org/jira/browse/NUTCH-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-2938.
Resolution: Won't Do
Closing - the any23 project has retired and the any23 plugi
[
https://issues.apache.org/jira/browse/NUTCH-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel closed NUTCH-2938.
--
> Use Any23's RepositoryWriter to write structured data to Rdf4j re
[
https://issues.apache.org/jira/browse/NUTCH-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-2938:
---
Fix Version/s: (was: 1.20)
> Use Any23's RepositoryWriter to write structured
[
https://issues.apache.org/jira/browse/NUTCH-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-2853.
Resolution: Fixed
> bin/nutch: remove deprecated commands solrindex, solrdedup, solrcl
[
https://issues.apache.org/jira/browse/NUTCH-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-2897.
Resolution: Fixed
> Do not supress deprecated API warni
[
https://issues.apache.org/jira/browse/NUTCH-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-3010.
Resolution: Fixed
> Injector: count unique number of injected U
1 - 100 of 3764 matches
Mail list logo