[
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660525#comment-16660525
]
Markus Jelsma commented on NUTCH-2665:
--
Updated patch defining the property in ivysettings.xml
[
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660455#comment-16660455
]
Markus Jelsma commented on NUTCH-2665:
--
Patch for 2.x!
> Upgrade to Apache Tika 1.1
[
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2665:
-
Attachment: NUTCH-2665.patch
> Upgrade to Apache Tika 1.1
Markus Jelsma created NUTCH-2665:
Summary: Upgrade to Apache Tika 1.19.1
Key: NUTCH-2665
URL: https://issues.apache.org/jira/browse/NUTCH-2665
Project: Nutch
Issue Type: Task
[
https://issues.apache.org/jira/browse/NUTCH-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658018#comment-16658018
]
Markus Jelsma commented on NUTCH-2651:
--
[~wastl-nagel] i can feel the sorrow. I was just about
[
https://issues.apache.org/jira/browse/NUTCH-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655133#comment-16655133
]
Markus Jelsma commented on NUTCH-2651:
--
+1
also thanks for finding the javax-ws fix, i could
[
https://issues.apache.org/jira/browse/NUTCH-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650282#comment-16650282
]
Markus Jelsma commented on NUTCH-2625:
--
Seems reasonable, +1
> ProtocolFactory.getProtocol(url)
[
https://issues.apache.org/jira/browse/NUTCH-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646261#comment-16646261
]
Markus Jelsma commented on NUTCH-2186:
--
[~asm123] please open a new ticket
> -addBinaryContent f
[
https://issues.apache.org/jira/browse/NUTCH-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643927#comment-16643927
]
Markus Jelsma commented on NUTCH-2192:
--
Nice! I completely forgot these ancient issues. Thanks
[
https://issues.apache.org/jira/browse/NUTCH-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642971#comment-16642971
]
Markus Jelsma commented on NUTCH-2648:
--
I misread the patch regarding the other plugins. So +1
[
https://issues.apache.org/jira/browse/NUTCH-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642463#comment-16642463
]
Markus Jelsma commented on NUTCH-2648:
--
+1!
Although i would suggest to mention it works only
[
https://issues.apache.org/jira/browse/NUTCH-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2647.
--
Resolution: Fixed
Committed
To https://gitbox.apache.org/repos/asf/nutch.git
9d59538c
[
https://issues.apache.org/jira/browse/NUTCH-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631599#comment-16631599
]
Markus Jelsma commented on NUTCH-2647:
--
To confirm, protocol-httpclient also by default ignores self
[
https://issues.apache.org/jira/browse/NUTCH-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631038#comment-16631038
]
Markus Jelsma commented on NUTCH-2647:
--
Hello Sebastian,
My own implementation of X509TrustManager
[
https://issues.apache.org/jira/browse/NUTCH-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631040#comment-16631040
]
Markus Jelsma commented on NUTCH-2623:
--
+1!
Thanks Sebastian!
> Fetcher to guarantee de
[
https://issues.apache.org/jira/browse/NUTCH-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2647:
-
Summary: Skip TLS certificate checks in protocol-http (was: Support for
dummy X509 trust
[
https://issues.apache.org/jira/browse/NUTCH-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2647:
-
Summary: Skip TLS certificate checks in protocol-http plugin (was: Skip
TLS certificate checks
[
https://issues.apache.org/jira/browse/NUTCH-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620448#comment-16620448
]
Markus Jelsma commented on NUTCH-2647:
--
patch for 1.15 source
> Support for dummy X509 tr
[
https://issues.apache.org/jira/browse/NUTCH-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2647:
-
Attachment: NUTCH-2647.patch
> Support for dummy X509 trust mana
Markus Jelsma created NUTCH-2647:
Summary: Support for dummy X509 trust manager
Key: NUTCH-2647
URL: https://issues.apache.org/jira/browse/NUTCH-2647
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613322#comment-16613322
]
Markus Jelsma commented on NUTCH-2623:
--
+1, however, i would not have expected a byHostProtocol
Markus Jelsma created NUTCH-2630:
Summary: Fetcher to log skipped records by robots.txt
Key: NUTCH-2630
URL: https://issues.apache.org/jira/browse/NUTCH-2630
Project: Nutch
Issue Type
However, the test crawl ran/runs fine, in the background, no errors. But just
now, watching the fetcher, i noticed the crawl delay is not always respected.
The only configuration change i have is the http.agent.* directives to run.
2018-08-01 11:47:41,256 INFO fetcher.FetcherThread -
All tests pass, crawler run fine so far, +1 for 1.15!
Regards,
Markus
-Original message-
> From:Sebastian Nagel
> Sent: Thursday 26th July 2018 17:05
> To: u...@nutch.apache.org
> Cc: dev@nutch.apache.org
> Subject: [VOTE] Release Apache Nutch 1.15 RC#1
>
> Hi Folks,
>
> A first
[
https://issues.apache.org/jira/browse/NUTCH-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554232#comment-16554232
]
Markus Jelsma edited comment on NUTCH-2612 at 7/24/18 1:24 PM:
---
Updated
[
https://issues.apache.org/jira/browse/NUTCH-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554232#comment-16554232
]
Markus Jelsma commented on NUTCH-2612:
--
Updated patch:
* logging when a hostname is processed
[
https://issues.apache.org/jira/browse/NUTCH-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16532712#comment-16532712
]
Markus Jelsma commented on NUTCH-2612:
--
New patch!
> Support for sitemap processing by hostn
[
https://issues.apache.org/jira/browse/NUTCH-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2612:
-
Attachment: NUTCH-2612.patch
> Support for sitemap processing by hostn
[
https://issues.apache.org/jira/browse/NUTCH-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16532699#comment-16532699
]
Markus Jelsma commented on NUTCH-2614:
--
Yes!
> NPE in CrawlDbRea
[
https://issues.apache.org/jira/browse/NUTCH-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16532691#comment-16532691
]
Markus Jelsma commented on NUTCH-2612:
--
Yes of course! Will upload new patch!
> Support for site
[
https://issues.apache.org/jira/browse/NUTCH-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16532446#comment-16532446
]
Markus Jelsma edited comment on NUTCH-2614 at 7/4/18 9:26 AM:
--
-Really
[
https://issues.apache.org/jira/browse/NUTCH-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16532446#comment-16532446
]
Markus Jelsma commented on NUTCH-2614:
--
Really? In that case my patch for NUTCH-2612 is probably
Markus Jelsma created NUTCH-2614:
Summary: NPE in CrawlDbReader
Key: NUTCH-2614
URL: https://issues.apache.org/jira/browse/NUTCH-2614
Project: Nutch
Issue Type: Bug
Components
[
https://issues.apache.org/jira/browse/NUTCH-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2612:
-
Attachment: NUTCH-2612.patch
> Support for sitemap processing by hostn
[
https://issues.apache.org/jira/browse/NUTCH-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531253#comment-16531253
]
Markus Jelsma commented on NUTCH-2612:
--
Patch for master!
> Support for sitemap process
Markus Jelsma created NUTCH-2612:
Summary: Support for sitemap processing by hostname
Key: NUTCH-2612
URL: https://issues.apache.org/jira/browse/NUTCH-2612
Project: Nutch
Issue Type
[
https://issues.apache.org/jira/browse/NUTCH-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517997#comment-16517997
]
Markus Jelsma commented on NUTCH-2606:
--
Ah, this is interesting. Nutch indeed believes it is a Word
[
https://issues.apache.org/jira/browse/NUTCH-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2597:
-
Description:
I get an NPE on updatehostdb. I start with a clean crawlDB & hostDB. A
Ah, wrong thread. But it seems some things are not entirely right for 1.15
release just yet.
Markus
-Original message-
> From:Markus Jelsma
> Sent: Wednesday 13th June 2018 12:44
> To: dev@nutch.apache.org
> Subject: RE: Nutch 1.14 issues
>
> Hi,
>
> I've got some tests failing
Hi,
I've got some tests failing here on a vanilla master check out.
[junit] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
0.314 sec
[junit] Test org.apache.nutch.net.TestURLNormalizers FAILED
Jurian had protocol-http's test failing just now, but running ant test on my
[
https://issues.apache.org/jira/browse/NUTCH-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503049#comment-16503049
]
Markus Jelsma commented on NUTCH-2416:
--
Thanks!
> Fetcher to log thread
[
https://issues.apache.org/jira/browse/NUTCH-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-2416.
> Fetcher to log thread ID
>
>
> Key
Markus Jelsma created NUTCH-2585:
Summary: NPE in TrieStringMatcher
Key: NUTCH-2585
URL: https://issues.apache.org/jira/browse/NUTCH-2585
Project: Nutch
Issue Type: Bug
Affects Versions
[
https://issues.apache.org/jira/browse/NUTCH-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454287#comment-16454287
]
Markus Jelsma commented on NUTCH-2573:
--
Sounds like a good idea!
> Suspend crawling if robots.
[
https://issues.apache.org/jira/browse/NUTCH-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453826#comment-16453826
]
Markus Jelsma commented on NUTCH-1228:
--
Wow, this is ancient! Thanks!
> Change mapred.task.time
[
https://issues.apache.org/jira/browse/NUTCH-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449722#comment-16449722
]
Markus Jelsma commented on NUTCH-2572:
--
+1
> HostDb: updatehostdb does not set val
[
https://issues.apache.org/jira/browse/NUTCH-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419027#comment-16419027
]
Markus Jelsma commented on NUTCH-2547:
--
Hello Sebastian, option two sounds fine.
> urlnormali
[
https://issues.apache.org/jira/browse/NUTCH-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407981#comment-16407981
]
Markus Jelsma edited comment on NUTCH-2541 at 3/21/18 3:17 PM
[
https://issues.apache.org/jira/browse/NUTCH-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407981#comment-16407981
]
Markus Jelsma commented on NUTCH-2541:
--
This is probably not a 1.14 problem, we fixed it some
[
https://issues.apache.org/jira/browse/NUTCH-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2411.
--
Resolution: Fixed
Committed for 1.15
bd70d2fe..9a77f437 master -> master
> Index-me
[
https://issues.apache.org/jira/browse/NUTCH-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391141#comment-16391141
]
Markus Jelsma commented on NUTCH-2411:
--
Forgot the last time i threatened to commit, will try again
[
https://issues.apache.org/jira/browse/NUTCH-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391139#comment-16391139
]
Markus Jelsma commented on NUTCH-2525:
--
Any comments on this one? Julien did the initial work
[
https://issues.apache.org/jira/browse/NUTCH-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2525:
-
Attachment: NUTCH-2525.patch
> Metadata indexer cannot handle uppercase parse metad
Markus Jelsma created NUTCH-2525:
Summary: Metadata indexer cannot handle uppercase parse metadata
Key: NUTCH-2525
URL: https://issues.apache.org/jira/browse/NUTCH-2525
Project: Nutch
Issue
silly due to it being late
>
> On Wed, Feb 21, 2018 at 1:37 AM, BlackIce <blackice...@gmail.com
> <mailto:blackice...@gmail.com>> wrote:
> I commented out the date and now after a whole lot of warnings it says Build
> Successful
>
> Im gonna take it for a short
Hello,
Well, this is interesting! Have you tried Java 8 instead? I don´t think 9
should cause these kinds of problems but i haven't tried it yet, but would like
to know anyway.
Regarding commenting out the date, try it anyway!
Regards,
Markus
-Original message-
> From:BlackIce
[
https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347762#comment-16347762
]
Markus Jelsma commented on NUTCH-2466:
--
Another note, curious to see browser developers allow over
[
https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347762#comment-16347762
]
Markus Jelsma edited comment on NUTCH-2466 at 1/31/18 11:14 PM:
Another
[
https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347749#comment-16347749
]
Markus Jelsma commented on NUTCH-2466:
--
Glad to hear this will work for you!
> Sitemap proces
[
https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347735#comment-16347735
]
Markus Jelsma commented on NUTCH-2466:
--
Hello Moreno,
Well, we obviously could allow a -1 setting
[
https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2466.
--
Resolution: Fixed
> Sitemap processor to follow redire
[
https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346862#comment-16346862
]
Markus Jelsma commented on NUTCH-2466:
--
Thanks!
remote: Sending notification emails to: ['"
[
https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346768#comment-16346768
]
Markus Jelsma commented on NUTCH-2466:
--
New patch!
> Sitemap processor to follow redire
[
https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2466:
-
Attachment: NUTCH-2466.patch
> Sitemap processor to follow redire
[
https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346730#comment-16346730
]
Markus Jelsma commented on NUTCH-2466:
--
Will commit shortly unless objections.
> Sitemap proces
[
https://issues.apache.org/jira/browse/NUTCH-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338290#comment-16338290
]
Markus Jelsma commented on NUTCH-2369:
--
How is this different from the current WebGraph package which
[
https://issues.apache.org/jira/browse/NUTCH-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335984#comment-16335984
]
Markus Jelsma commented on NUTCH-2503:
--
Hmm, in the past you could run ant -f src/plugin/urlfilter
[
https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335949#comment-16335949
]
Markus Jelsma commented on NUTCH-2466:
--
First patch adding maxRedir configurable and filterNormalize
[
https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2466:
-
Attachment: NUTCH-2466.patch
> Sitemap processor to follow redire
[
https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16328892#comment-16328892
]
Markus Jelsma commented on NUTCH-2466:
--
Ah, crap yeah. Won't get back to this today. Hopefully later
[
https://issues.apache.org/jira/browse/NUTCH-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326999#comment-16326999
]
Markus Jelsma commented on NUTCH-2496:
--
Yes it makes a lot of sense to disable it everywhere except
[
https://issues.apache.org/jira/browse/NUTCH-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326999#comment-16326999
]
Markus Jelsma edited comment on NUTCH-2496 at 1/16/18 10:52 AM:
Yes
[
https://issues.apache.org/jira/browse/NUTCH-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325053#comment-16325053
]
Markus Jelsma commented on NUTCH-2496:
--
If you use the same filters/normalizers everywhere in Nutch
[
https://issues.apache.org/jira/browse/NUTCH-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16303688#comment-16303688
]
Markus Jelsma commented on NUTCH-2487:
--
It seems Nutch and your plugin are using a different version
the office.
>
> I'm also against mentioning the open issues in the release notes, it's normal
> to have open/unresolved issues before a release and we should focus only on
> mentioning what was added/fixed, for the remaining issues we already have
> Jira (which is public).
>
I do not agree on mentioning those issues as unresolved in the release notes.
They are known open issues, just as many others are known and open issues.
There is no reason to mention these specific issues and not mentioning all the
other open issues.
Otherwise +1;
Thanks Sebastian!
[
https://issues.apache.org/jira/browse/NUTCH-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2485:
-
Attachment: NUTCH-2485.patch
Patch!
> ParserFactory swallows except
Markus Jelsma created NUTCH-2485:
Summary: ParserFactory swallows exception
Key: NUTCH-2485
URL: https://issues.apache.org/jira/browse/NUTCH-2485
Project: Nutch
Issue Type: Bug
Affects
[
https://issues.apache.org/jira/browse/NUTCH-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294150#comment-16294150
]
Markus Jelsma commented on NUTCH-2478:
--
Thanks!
> // is not a valid base
[
https://issues.apache.org/jira/browse/NUTCH-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2320.
--
Resolution: Duplicate
> URLFilterChecker to run as TCP Telnet serv
[
https://issues.apache.org/jira/browse/NUTCH-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-2338.
> URLNormalizerChecker to run as TCP Telnet serv
[
https://issues.apache.org/jira/browse/NUTCH-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-2478.
> // is not a valid base URL
> --
>
> Key
[
https://issues.apache.org/jira/browse/NUTCH-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2338.
--
Resolution: Duplicate
> URLNormalizerChecker to run as TCP Telnet serv
[
https://issues.apache.org/jira/browse/NUTCH-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294148#comment-16294148
]
Markus Jelsma commented on NUTCH-2338:
--
Yes!
> URLNormalizerChecker to run as TCP Telnet serv
[
https://issues.apache.org/jira/browse/NUTCH-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292469#comment-16292469
]
Markus Jelsma commented on NUTCH-2439:
--
Weird, i only got :
Dec 15, 2017 1:45:42 PM
[
https://issues.apache.org/jira/browse/NUTCH-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292421#comment-16292421
]
Markus Jelsma commented on NUTCH-2439:
--
Note, since 1.17, all but one of the warnings are gone
[
https://issues.apache.org/jira/browse/NUTCH-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292419#comment-16292419
]
Markus Jelsma commented on NUTCH-2478:
--
I prefer your patch, it also carries a test
[
https://issues.apache.org/jira/browse/NUTCH-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290955#comment-16290955
]
Markus Jelsma commented on NUTCH-2354:
--
Yes, i think we should include this.
> Upgrade Had
[
https://issues.apache.org/jira/browse/NUTCH-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290957#comment-16290957
]
Markus Jelsma commented on NUTCH-2474:
--
+1
> CrawlDbReader -stats fails with ClassCastExcept
[
https://issues.apache.org/jira/browse/NUTCH-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288300#comment-16288300
]
Markus Jelsma commented on NUTCH-2478:
--
To clarify a bad sentence, i resolve the missing protocol
[
https://issues.apache.org/jira/browse/NUTCH-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288289#comment-16288289
]
Markus Jelsma commented on NUTCH-2478:
--
Yes, this needs a change in the parser plugins. I sought
[
https://issues.apache.org/jira/browse/NUTCH-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2478:
-
Description:
This test fails:
{code}
@Test
public void testBadResolver() throws Exception
Markus Jelsma created NUTCH-2478:
Summary: // is not a valid base URL
Key: NUTCH-2478
URL: https://issues.apache.org/jira/browse/NUTCH-2478
Project: Nutch
Issue Type: Bug
Affects
Happy to hear. There are major improvements in Tika 1.17, it deals much better
with some of the more extravagant web pages you find on the web.
-Original message-
> From:Sebastian Nagel
> Sent: Tuesday 12th December 2017 13:36
> To: dev@nutch.apache.org
>
Yes, please do :)
-Original message-
> From:BlackIce
> Sent: Friday 8th December 2017 23:57
> To: dev@nutch.apache.org
> Subject: Re: [DISCUSS] Release 1.14?
>
> OK, Ill test the RC
>
> On Dec 8, 2017 11:54 PM, "Sebastian Nagel"
[
https://issues.apache.org/jira/browse/NUTCH-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280285#comment-16280285
]
Markus Jelsma commented on NUTCH-2472:
--
Yes probably. There are many sitemaps out there that link
[
https://issues.apache.org/jira/browse/NUTCH-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280179#comment-16280179
]
Markus Jelsma commented on NUTCH-2472:
--
Oh crap, you are right. It happened to us yesterday, but now
[
https://issues.apache.org/jira/browse/NUTCH-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-2472.
Resolution: Not A Problem
> Sitemap processor does not honour db.ignore.external.li
Markus Jelsma created NUTCH-2472:
Summary: Sitemap processor does not honour db.ignore.external.links
Key: NUTCH-2472
URL: https://issues.apache.org/jira/browse/NUTCH-2472
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277448#comment-16277448
]
Markus Jelsma commented on NUTCH-2470:
--
Ah, you are using t-digest, very nice library indeed.
+1
301 - 400 of 3217 matches
Mail list logo