lewismc merged PR #813:
URL: https://github.com/apache/nutch/pull/813
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org
lewismc commented on PR #2:
URL: https://github.com/apache/nutch-site/pull/2#issuecomment-2112989006
Yes thank you @sebbASF
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
sebastian-nagel commented on PR #814:
URL: https://github.com/apache/nutch/pull/814#issuecomment-2110558876
Thanks, @lewismc! The metrics wiki page was updated.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
sebastian-nagel merged PR #814:
URL: https://github.com/apache/nutch/pull/814
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail:
sebastian-nagel merged PR #812:
URL: https://github.com/apache/nutch/pull/812
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail:
sebastian-nagel commented on PR #2:
URL: https://github.com/apache/nutch-site/pull/2#issuecomment-2105982524
Thanks, @sebbASF!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
sebastian-nagel merged PR #2:
URL: https://github.com/apache/nutch-site/pull/2
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail:
sebbASF opened a new pull request, #2:
URL: https://github.com/apache/nutch-site/pull/2
Nutch is currently not listed under the web-framework category on
projects.apache.org
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
lewismc merged PR #817:
URL: https://github.com/apache/nutch/pull/817
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org
sebastian-nagel opened a new pull request, #816:
URL: https://github.com/apache/nutch/pull/816
and NUTCH-1942 Remove TopLevelDomain
- use methods from crawler-commons' EffectiveTldFinder in URLUtil replacing
classed and methods from the "org.apache.nutch.util.domain" package
lewismc commented on PR #815:
URL: https://github.com/apache/nutch/pull/815#issuecomment-2081564107
Excellent @sebastian-nagel +1
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
lewismc commented on PR #814:
URL: https://github.com/apache/nutch/pull/814#issuecomment-2081563229
Excellent @sebastian-nagel
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
sebastian-nagel commented on PR #815:
URL: https://github.com/apache/nutch/pull/815#issuecomment-2080743831
... also fixed the Javadoc error.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
sebastian-nagel commented on PR #814:
URL: https://github.com/apache/nutch/pull/814#issuecomment-2080634329
Hi @lewismc:
- "use parameterized logging": done
- "augment the [metrics
documentation](https://cwiki.apache.org/confluence/display/NUTCH/Metrics) once
this is merged.": will
sebastian-nagel commented on PR #815:
URL: https://github.com/apache/nutch/pull/815#issuecomment-2080603546
> we could provide a TestGenerator#testNullHostInReducer test case
Good idea! Done, see 4729786.
--
This is an automated message from the Apache Git Service.
To respond to
lewismc commented on code in PR #814:
URL: https://github.com/apache/nutch/pull/814#discussion_r1579883313
##
src/java/org/apache/nutch/crawl/Generator.java:
##
@@ -253,10 +256,7 @@ public void map(Text key, CrawlDatum value, Context
context)
try {
sort =
lewismc commented on PR #813:
URL: https://github.com/apache/nutch/pull/813#issuecomment-2067543713
The logging now looks as follows
```INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0]
Found 1 URLExemptionFilter implementations:
sebastian-nagel opened a new pull request, #812:
URL: https://github.com/apache/nutch/pull/812
Pass ftp:// URLs to the standard JVM URLStreamHandler
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
lewismc merged PR #811:
URL: https://github.com/apache/nutch/pull/811
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org
lewismc opened a new pull request, #811:
URL: https://github.com/apache/nutch/pull/811
PR for https://issues.apache.org/jira/browse/NUTCH-3038
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
lewismc merged PR #810:
URL: https://github.com/apache/nutch/pull/810
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org
CatChullain commented on PR #810:
URL: https://github.com/apache/nutch/pull/810#issuecomment-2028497765
Thanks again, @lewismc.
I did add those INFO messages, but I found an extra call to setIndexedConf
from setConf that the filter() method handles more cleanly, so I removed that,
lewismc commented on PR #810:
URL: https://github.com/apache/nutch/pull/810#issuecomment-2028343327
Hi @CatChullain I associated this Jira ticket to the 1.20 release and made
you assignee
We will get it merged soon and roll the release.
--
This is an automated message from the
lewismc merged PR #808:
URL: https://github.com/apache/nutch/pull/808
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org
lewismc merged PR #807:
URL: https://github.com/apache/nutch/pull/807
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org
lewismc merged PR #809:
URL: https://github.com/apache/nutch/pull/809
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org
lewismc commented on PR #810:
URL: https://github.com/apache/nutch/pull/810#issuecomment-2028304406
@CatChullain thanks for your patience whilst we work this one
> … I wonder where might be good spots for INFO level messages
The reason I suggested that the log level be
CatChullain commented on PR #810:
URL: https://github.com/apache/nutch/pull/810#issuecomment-2028122038
Thanks, Lewis! I moved all four to DEBUG, but I wonder where might be good
spots for INFO level messages. I'm thinking of the operator or tech who doesn't
dig into code and has an issue
lewismc commented on code in PR #810:
URL: https://github.com/apache/nutch/pull/810#discussion_r1544806230
##
src/plugin/index-arbitrary/src/java/org/apache/nutch/indexer/arbitrary/ArbitraryIndexingFilter.java:
##
@@ -0,0 +1,284 @@
+/*
+ * Licensed to the Apache Software
CatChullain commented on PR #810:
URL: https://github.com/apache/nutch/pull/810#issuecomment-2021774505
Thanks, Lewis! I got some of it done today. I'll consolidate the LOG
statements a bit more tomorrow.
--
This is an automated message from the Apache Git Service.
To respond to the
lewismc commented on code in PR #810:
URL: https://github.com/apache/nutch/pull/810#discussion_r1539452666
##
src/plugin/index-arbitrary/src/java/org/apache/nutch/indexer/arbitrary/ArbitraryIndexingFilter.java:
##
@@ -0,0 +1,266 @@
+package org.apache.nutch.indexer.arbitrary;
+
lewismc commented on code in PR #810:
URL: https://github.com/apache/nutch/pull/810#discussion_r1539390873
##
src/plugin/index-arbitrary/ivy.xml:
##
@@ -0,0 +1,41 @@
+
+
+
+
Review Comment:
Please remove whitespace.
##
CatChullain opened a new pull request, #810:
URL: https://github.com/apache/nutch/pull/810
This is the initial code for an arbitrary indexing filter, NUTCH-3032.
It could be helpful to let end users manipulate information at indexing time
with their own code without the need for
sebastian-nagel commented on PR #808:
URL: https://github.com/apache/nutch/pull/808#issuecomment-2000233258
Hi Lewis, it's done in three steps:
1. run `ant report-licenses` (Rat task) for core and all plugins
2. process all reports: list all combinations of , try to
extract the
tballison closed pull request #799: NUTCH-3026 -- add statusOnly as an indexing
option
URL: https://github.com/apache/nutch/pull/799
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
lewismc commented on PR #712:
URL: https://github.com/apache/nutch/pull/712#issuecomment-1998875276
Closing this PR out. StatsD is widely used but open source Java SDK’s/agents
are few and far between.
When I get around to properly instrumenting Nutch I will probably suggest
that we use
lewismc closed pull request #712: WIP StatsD metrics example
URL: https://github.com/apache/nutch/pull/712
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe,
lewismc closed pull request #807: NUTCH-3036 Upgrade
org.seleniumhq.selenium:selenium-java dependency i…
URL: https://github.com/apache/nutch/pull/807
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
lewismc commented on PR #807:
URL: https://github.com/apache/nutch/pull/807#issuecomment-1998718730
There are some tangential proposed changes (such as improvements to logging)
to this PR but they concern the relevant Class files.
--
This is an automated message from the Apache Git
lewismc commented on PR #808:
URL: https://github.com/apache/nutch/pull/808#issuecomment-1998717443
Hi @sebastian-nagel did you perform this task manually?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
lewismc closed pull request #808: NUTCH-3035 Update license and notice file for
release of 1.20
URL: https://github.com/apache/nutch/pull/808
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
lewismc commented on PR #807:
URL: https://github.com/apache/nutch/pull/807#issuecomment-1998714969
[Further guidance on browser compatibility/supported
platforms](https://firefox-source-docs.mozilla.org/testing/geckodriver/Support.html)
Along the way I discovered that **_full
lewismc commented on PR #807:
URL: https://github.com/apache/nutch/pull/807#issuecomment-1998711992
PR ready or review. Tested on
* MacBook Pro
* Apple M1 Pro
* Sonora 14.4
* Firefox 115.X (compatible with current version of Selenium)
--
This is an automated message from the
sebastian-nagel opened a new pull request, #808:
URL: https://github.com/apache/nutch/pull/808
Update the license and notice files of dependencies included as binary jar
files in the binary release.
--
This is an automated message from the Apache Git Service.
To respond to the message,
sebastian-nagel merged PR #806:
URL: https://github.com/apache/nutch/pull/806
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail:
lewismc commented on PR #806:
URL: https://github.com/apache/nutch/pull/806#issuecomment-1995922015
Tested with ES 7.10.2 6 node cluster. +1 LGTM.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
sebastian-nagel opened a new pull request, #806:
URL: https://github.com/apache/nutch/pull/806
This PR downgrades the ES client to version 7.10.2 which is licensed under
ASF 2.0 - it's a quick fix to stay compatible with ASF policies.
Not yet tested: indexing into ES
To be
lewismc commented on PR #803:
URL: https://github.com/apache/nutch/pull/803#issuecomment-1994562354
Thanks @sebastian-nagel
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
lewismc merged PR #803:
URL: https://github.com/apache/nutch/pull/803
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org
lewismc commented on PR #805:
URL: https://github.com/apache/nutch/pull/805#issuecomment-1993567784
Thanks @derhecht
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
lewismc merged PR #805:
URL: https://github.com/apache/nutch/pull/805
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org
lewismc commented on PR #803:
URL: https://github.com/apache/nutch/pull/803#issuecomment-1993545146
After lots of trial and error I think I cracked this one. Ultimately there
were several places where the optional `(-[classifier])` element has to be
added to the `ivy:retrieve pattern`.
lewismc closed pull request #803: NUTCH-3033 Upgrade Ivy to v2.5.2
URL: https://github.com/apache/nutch/pull/803
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
derhecht commented on PR #801:
URL: https://github.com/apache/nutch/pull/801#issuecomment-1991968018
see #805
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
derhecht opened a new pull request, #805:
URL: https://github.com/apache/nutch/pull/805
Alpine is using ash shell by default which results in an not set JAVA_HOME
environment variable
Sry, there is no issue reported atm on issues.apache.org - never the less,
it is one I'm facing to
lewismc commented on PR #803:
URL: https://github.com/apache/nutch/pull/803#issuecomment-1989446036
Hmmm, I upgraded to 2.5.1 and the CI runs just fine. Looks like there is
some regression/additional configuration required with 2.5.2. I’m asking the
question over on ivy-user@ mailing list.
lewismc commented on PR #799:
URL: https://github.com/apache/nutch/pull/799#issuecomment-1989404991
Hmmm. It appears that there are problems with the `protocol-http` unit tests…
```
[echo] Testing plugin: protocol-http
[junit] Running
lewismc closed pull request #799: NUTCH-3026 -- add statusOnly as an indexing
option
URL: https://github.com/apache/nutch/pull/799
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
lewismc commented on PR #801:
URL: https://github.com/apache/nutch/pull/801#issuecomment-1989379558
@derhecht apologies I merged this mistakenly.
Can you please submit the PR against master branch?
Thank you
--
This is an automated message from the Apache Git Service.
To respond to
lewismc commented on PR #799:
URL: https://github.com/apache/nutch/pull/799#issuecomment-1989380993
Reopening to have CI run again.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
lewismc merged PR #804:
URL: https://github.com/apache/nutch/pull/804
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org
lewismc opened a new pull request, #804:
URL: https://github.com/apache/nutch/pull/804
Reverts apache/nutch#801
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
lewismc merged PR #801:
URL: https://github.com/apache/nutch/pull/801
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org
lewismc commented on PR #803:
URL: https://github.com/apache/nutch/pull/803#issuecomment-1989365064
OK so it looks like the [newer Ivy version is being used just
fine](https://github.com/apache/nutch/actions/runs/8239165168/job/22531780061?pr=803#step:4:78).
The build did however fail with
lewismc opened a new pull request, #803:
URL: https://github.com/apache/nutch/pull/803
PR for https://issues.apache.org/jira/browse/NUTCH-3033
I was having trouble locally resolving the Ivy version to 2.5.2… I can’t yet
figure out why 2.5.1 was being used. I’ll check out the CI log and
sebastian-nagel commented on PR #800:
URL: https://github.com/apache/nutch/pull/800#issuecomment-1987171023
Thanks, @derhecht! Good catch!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
sebastian-nagel merged PR #800:
URL: https://github.com/apache/nutch/pull/800
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail:
sebastian-nagel closed pull request #802: fix for NUTCH-3027 contributed by
skehrli
URL: https://github.com/apache/nutch/pull/802
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
sebastian-nagel commented on PR #802:
URL: https://github.com/apache/nutch/pull/802#issuecomment-1987165751
Patch applied to master in d95e1a7, see comments on Jira in NUTCH-3027.
Thanks again @skehrli !
--
This is an automated message from the Apache Git Service.
To respond to the
skehrli opened a new pull request, #802:
URL: https://github.com/apache/nutch/pull/802
Thanks for your contribution to [Apache Nutch](https://nutch.apache.org/)!
Your help is appreciated!
Before opening the pull request, please verify that
* there is an open issue on the [Nutch
lewismc commented on PR #294:
URL: https://github.com/apache/nutch/pull/294#issuecomment-1877232892
Hi @grege117 I’ll try to have a crack at this _soon_. Thanks for the heads
up. If you feel like forking the branch and having a go at the fix, then please
do. I will try to shepherd in your
grege117 commented on PR #294:
URL: https://github.com/apache/nutch/pull/294#issuecomment-1868000580
Sorry to chime in a few years late, but I'm not sure this plugin is
configured correctly.
If I modify my conf/index-writers.xml and remove everything except for
", you will get the
derhecht opened a new pull request, #800:
URL: https://github.com/apache/nutch/pull/800
Show --dedup-group instead of -dedup-group which have lead to
misunderstanding output
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
lewismc merged PR #795:
URL: https://github.com/apache/nutch/pull/795
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org
tballison opened a new pull request, #799:
URL: https://github.com/apache/nutch/pull/799
Thanks for your contribution to [Apache Nutch](https://nutch.apache.org/)!
Your help is appreciated!
Before opening the pull request, please verify that
* there is an open issue on the [Nutch
GabeHaegele opened a new pull request, #798:
URL: https://github.com/apache/nutch/pull/798
Thanks for your contribution to [Apache Nutch](https://nutch.apache.org/)!
Your help is appreciated!
Before opening the pull request, please verify that
* there is an open issue on the
sebastian-nagel commented on PR #796:
URL: https://github.com/apache/nutch/pull/796#issuecomment-1802531264
Thanks, @jnioche!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
sebastian-nagel merged PR #796:
URL: https://github.com/apache/nutch/pull/796
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail:
jnioche commented on PR #796:
URL: https://github.com/apache/nutch/pull/796#issuecomment-1801938355
@sebastian-nagel merged the changes from master and made a few improvements
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
sebastian-nagel commented on PR #793:
URL: https://github.com/apache/nutch/pull/793#issuecomment-1801814549
Thanks, @jnioche!
Merged into master, adding the lines to make use of Hadoop-provided
compression codecs.
Successfully tested in local and pseudo-distributed mode with
sebastian-nagel closed pull request #793: [NUTCH-3017] Allow fast-urlfilter to
load from HDFS/S3
URL: https://github.com/apache/nutch/pull/793
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
jnioche commented on PR #796:
URL: https://github.com/apache/nutch/pull/796#issuecomment-1798221743
Writing a test for this thing is an absolute pain. The way the filters are
used for real is that their method setConf is called and the rules are loaded
using _getConfResourceAsReader_, i.e.
jnioche commented on code in PR #796:
URL: https://github.com/apache/nutch/pull/796#discussion_r1384621727
##
src/plugin/urlfilter-fast/src/java/org/apache/nutch/urlfilter/fast/FastURLFilter.java:
##
@@ -97,9 +97,17 @@ public class FastURLFilter implements URLFilter {
sebastian-nagel commented on code in PR #796:
URL: https://github.com/apache/nutch/pull/796#discussion_r1384536930
##
src/plugin/urlfilter-fast/src/java/org/apache/nutch/urlfilter/fast/FastURLFilter.java:
##
@@ -97,9 +97,17 @@ public class FastURLFilter implements URLFilter {
tballison merged PR #794:
URL: https://github.com/apache/nutch/pull/794
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail:
tballison merged PR #797:
URL: https://github.com/apache/nutch/pull/797
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail:
tballison commented on PR #797:
URL: https://github.com/apache/nutch/pull/797#issuecomment-1795161171
```2023-11-06T15:02:47.9408964Z [junit] Tests run: 14, Failures: 2,
Errors: 0, Skipped: 4, Time elapsed: 4.342 sec
2023-11-06T15:02:48.2192793Z [junit] Test
tballison commented on PR #797:
URL: https://github.com/apache/nutch/pull/797#issuecomment-1794934171
Need to keep as draft until the 2.9.1.0 shim actually lands in maven central.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
tballison opened a new pull request, #797:
URL: https://github.com/apache/nutch/pull/797
Thanks for your contribution to [Apache Nutch](https://nutch.apache.org/)!
Your help is appreciated!
Before opening the pull request, please verify that
* there is an open issue on the [Nutch
lewismc opened a new pull request, #795:
URL: https://github.com/apache/nutch/pull/795
Addresses https://issues.apache.org/jira/browse/NUTCH-3024
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
lewismc merged PR #789:
URL: https://github.com/apache/nutch/pull/789
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org
lewismc commented on code in PR #789:
URL: https://github.com/apache/nutch/pull/789#discussion_r138646
##
src/java/org/apache/nutch/crawl/CrawlDbReader.java:
##
@@ -812,7 +811,7 @@ public CrawlDatum get(String crawlDb, String url,
Configuration config)
@Override
lewismc commented on PR #794:
URL: https://github.com/apache/nutch/pull/794#issuecomment-1789810071
We have no tests for `ParseSegment` right now. I think it would be excellent
if this PR could include a test for `ParseSegment.isTruncated`.
--
This is an automated message from the Apache
tballison opened a new pull request, #794:
URL: https://github.com/apache/nutch/pull/794
Thanks for your contribution to [Apache Nutch](https://nutch.apache.org/)!
Your help is appreciated!
Before opening the pull request, please verify that
* there is an open issue on the [Nutch
sebastian-nagel commented on code in PR #793:
URL: https://github.com/apache/nutch/pull/793#discussion_r1377375552
##
src/plugin/urlfilter-fast/src/java/org/apache/nutch/urlfilter/fast/FastURLFilter.java:
##
@@ -181,9 +186,23 @@ public String filter(String url) {
public
sebastian-nagel commented on code in PR #793:
URL: https://github.com/apache/nutch/pull/793#discussion_r1377375552
##
src/plugin/urlfilter-fast/src/java/org/apache/nutch/urlfilter/fast/FastURLFilter.java:
##
@@ -181,9 +186,23 @@ public String filter(String url) {
public
jnioche commented on PR #792:
URL: https://github.com/apache/nutch/pull/792#issuecomment-1785804884
Obivously, pulled more changes than I meant to
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
jnioche closed pull request #792: Allow fast-urlfilter to load from HDFS/S3 and
support gzipped input [NUTCH-3017]
URL: https://github.com/apache/nutch/pull/792
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
jnioche opened a new pull request, #792:
URL: https://github.com/apache/nutch/pull/792
See description in https://issues.apache.org/jira/browse/NUTCH-3017
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
sebastian-nagel commented on code in PR #789:
URL: https://github.com/apache/nutch/pull/789#discussion_r1375421979
##
src/java/org/apache/nutch/crawl/CrawlDbReader.java:
##
@@ -812,7 +811,7 @@ public CrawlDatum get(String crawlDb, String url,
Configuration config)
1 - 100 of 256 matches
Mail list logo