[
https://issues.apache.org/jira/browse/NUTCH-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018046#comment-17018046
]
Markus Jelsma commented on NUTCH-2761:
--
thanks!
> ivy jar fails to downl
[
https://issues.apache.org/jira/browse/NUTCH-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018035#comment-17018035
]
Markus Jelsma commented on NUTCH-2733:
--
Sounds good! +1
> protocol-okhttp: add support for Bro
[
https://issues.apache.org/jira/browse/NUTCH-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970109#comment-16970109
]
Markus Jelsma commented on NUTCH-2748:
--
Nice catch!
I think i would prefer the first option
Hello Sebastian,
All tests pass nicely and i can easily run a crawl.
+1
Thanks,
Markus
By the way, what does this mean:
2019-10-03 12:48:49,696 INFO crawl.Generator - Generator: number of items
rejected during selection:
2019-10-03 12:48:49,698 INFO crawl.Generator - Generator: 1
[
https://issues.apache.org/jira/browse/NUTCH-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925665#comment-16925665
]
Markus Jelsma commented on NUTCH-2612:
--
The error is all mine!
The corrected version is committed
[
https://issues.apache.org/jira/browse/NUTCH-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2612.
--
Resolution: Fixed
> Support for sitemap processing by hostn
[
https://issues.apache.org/jira/browse/NUTCH-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925644#comment-16925644
]
Markus Jelsma commented on NUTCH-2612:
--
Hello [~wastl-nagel], are you sure?
I just cleaned my
[
https://issues.apache.org/jira/browse/NUTCH-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924187#comment-16924187
]
Markus Jelsma commented on NUTCH-2612:
--
Any thoughts left? I'd like to get this one in.
> Supp
[
https://issues.apache.org/jira/browse/NUTCH-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919618#comment-16919618
]
Markus Jelsma commented on NUTCH-2669:
--
Great!
Thanks Sebastian!
> Reliable solution for javax
[
https://issues.apache.org/jira/browse/NUTCH-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911699#comment-16911699
]
Markus Jelsma commented on NUTCH-2730:
--
Hello [~wastl-nagel]!
Yes Crawler-Commons is definitely
[
https://issues.apache.org/jira/browse/NUTCH-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2730:
-
Attachment: NUTCH-2730.patch
> SitemapProcessor to treat sitemap URLs as Set instead of L
Markus Jelsma created NUTCH-2730:
Summary: SitemapProcessor to treat sitemap URLs as Set instead of
List
Key: NUTCH-2730
URL: https://issues.apache.org/jira/browse/NUTCH-2730
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900978#comment-16900978
]
Markus Jelsma commented on NUTCH-2727:
--
Good point!
Indeed, we just run Nutch on 3.2.0. We do
[
https://issues.apache.org/jira/browse/NUTCH-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900978#comment-16900978
]
Markus Jelsma edited comment on NUTCH-2727 at 8/6/19 12:29 PM:
---
Good point
[
https://issues.apache.org/jira/browse/NUTCH-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900908#comment-16900908
]
Markus Jelsma edited comment on NUTCH-2727 at 8/6/19 11:19 AM:
---
Hello
[
https://issues.apache.org/jira/browse/NUTCH-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900908#comment-16900908
]
Markus Jelsma commented on NUTCH-2727:
--
Hello @snagel
We have been running it, and other programs
[
https://issues.apache.org/jira/browse/NUTCH-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-2725.
> Plugin lib-http to support per-host configurable cook
[
https://issues.apache.org/jira/browse/NUTCH-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2725.
--
Resolution: Fixed
> Plugin lib-http to support per-host configurable cook
[
https://issues.apache.org/jira/browse/NUTCH-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895139#comment-16895139
]
Markus Jelsma commented on NUTCH-2725:
--
Committed a67c9bee..54f73bf7 master -> master
Tha
[
https://issues.apache.org/jira/browse/NUTCH-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2725:
-
Attachment: NUTCH-2725.patch
> Plugin lib-http to support per-host configurable cook
[
https://issues.apache.org/jira/browse/NUTCH-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892873#comment-16892873
]
Markus Jelsma commented on NUTCH-2725:
--
Addressed all three points. Thanks Sebastian!
> Plugin
[
https://issues.apache.org/jira/browse/NUTCH-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2725:
-
Attachment: NUTCH-2725.patch
> Plugin lib-http to support per-host configurable cook
[
https://issues.apache.org/jira/browse/NUTCH-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2725:
-
Patch Info: Patch Available
> Plugin lib-http to support per-host configurable cook
Markus Jelsma created NUTCH-2725:
Summary: Plugin lib-http to support per-host configurable cookies
Key: NUTCH-2725
URL: https://issues.apache.org/jira/browse/NUTCH-2725
Project: Nutch
Issue
[
https://issues.apache.org/jira/browse/NUTCH-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-2724.
> Metadata indexer not to emit empty val
[
https://issues.apache.org/jira/browse/NUTCH-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2724.
--
Resolution: Fixed
Committed 96924648..a67c9bee master -> master
Thanks!
> Metadata i
[
https://issues.apache.org/jira/browse/NUTCH-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2724:
-
Attachment: NUTCH-2724.patch
> Metadata indexer not to emit empty val
[
https://issues.apache.org/jira/browse/NUTCH-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883698#comment-16883698
]
Markus Jelsma commented on NUTCH-2724:
--
Of course, thanks, i should use isEmpty() more often, length
[
https://issues.apache.org/jira/browse/NUTCH-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-2723.
> Indexer Solr not to decode URLs before delet
[
https://issues.apache.org/jira/browse/NUTCH-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2723.
--
Resolution: Fixed
Thanks Sebastian!
Committed fc6a2742..96924648 master -> master
> I
[
https://issues.apache.org/jira/browse/NUTCH-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881287#comment-16881287
]
Markus Jelsma commented on NUTCH-2722:
--
I removed my Ivy cache, patched it and it went all fine.
I
[
https://issues.apache.org/jira/browse/NUTCH-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2724:
-
Attachment: NUTCH-2724.patch
> Metadata indexer not to emit empty val
Markus Jelsma created NUTCH-2724:
Summary: Metadata indexer not to emit empty values
Key: NUTCH-2724
URL: https://issues.apache.org/jira/browse/NUTCH-2724
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2723:
-
Attachment: NUTCH-2723.patch
> Indexer Solr not to decode URLs before delet
[
https://issues.apache.org/jira/browse/NUTCH-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2710:
-
Attachment: NUTCH-2710.patch
> Normalize before internal and external che
Markus Jelsma created NUTCH-2723:
Summary: Indexer Solr not to decode URLs before deletion
Key: NUTCH-2723
URL: https://issues.apache.org/jira/browse/NUTCH-2723
Project: Nutch
Issue Type
[
https://issues.apache.org/jira/browse/NUTCH-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833675#comment-16833675
]
Markus Jelsma commented on NUTCH-2585:
--
That seems fine enough! +1
> NPE in TrieStringMatc
[
https://issues.apache.org/jira/browse/NUTCH-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-1625.
--
Resolution: Won't Fix
> IndexerMapReduce skips FETCH_NOTMODIF
[
https://issues.apache.org/jira/browse/NUTCH-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830966#comment-16830966
]
Markus Jelsma commented on NUTCH-1625:
--
Closing this issue. For some reason, this patch doesn't
[
https://issues.apache.org/jira/browse/NUTCH-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2585:
-
Description:
Stumbled on this one just now:
{code}
2018-05-25 14:29:31,844 INFO [FetcherThread
[
https://issues.apache.org/jira/browse/NUTCH-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2585:
-
Description:
Stumbled on this one just now:
{code}
2018-05-25 14:29:31,844 INFO [FetcherThread
[
https://issues.apache.org/jira/browse/NUTCH-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma deleted NUTCH-2714:
-
> WHAT KINDS OF FUNCTION CAN FREELANCE ENGINEERS
[
https://issues.apache.org/jira/browse/NUTCH-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-2714.
> WHAT KINDS OF FUNCTION CAN FREELANCE ENGINEERS
[
https://issues.apache.org/jira/browse/NUTCH-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma deleted NUTCH-2712:
-
> The digital landscape is ever changing. It exists in a constant state of
> evo
[
https://issues.apache.org/jira/browse/NUTCH-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-2712.
Resolution: Invalid
> The digital landscape is ever changing. It exists in a constant st
[
https://issues.apache.org/jira/browse/NUTCH-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma deleted NUTCH-2711:
-
> The digital landscape is ever changing. It exists in a constant state of
> evo
[
https://issues.apache.org/jira/browse/NUTCH-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-2711.
Resolution: Invalid
Closing spam!
> The digital landscape is ever changing. It exi
[
https://issues.apache.org/jira/browse/NUTCH-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2711:
-
Description:
Field Engineer offers an online field technician marketplace that helps
companies
Markus Jelsma created NUTCH-2710:
Summary: Normalize before internal and external checks
Key: NUTCH-2710
URL: https://issues.apache.org/jira/browse/NUTCH-2710
Project: Nutch
Issue Type
[
https://issues.apache.org/jira/browse/NUTCH-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816191#comment-16816191
]
Markus Jelsma commented on NUTCH-2704:
--
+1
> Upgrade crawler-commons dependency to
[
https://issues.apache.org/jira/browse/NUTCH-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815305#comment-16815305
]
Markus Jelsma commented on NUTCH-2703:
--
remote: To git@github:apache/nutch.git
remote:bf75e96
[
https://issues.apache.org/jira/browse/NUTCH-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2703.
--
Resolution: Fixed
Assignee: Markus Jelsma
> parse-tika: Boilerpipe should not
[
https://issues.apache.org/jira/browse/NUTCH-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815302#comment-16815302
]
Markus Jelsma commented on NUTCH-2703:
--
Thanks for not missing both MIME types, text/html
[
https://issues.apache.org/jira/browse/NUTCH-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2703:
-
Priority: Minor (was: Critical)
> parse-tika: Boilerpipe should not run for non-(X)HTML pa
[
https://issues.apache.org/jira/browse/NUTCH-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799038#comment-16799038
]
Markus Jelsma commented on NUTCH-2703:
--
This patch applies to the current Github master source
[
https://issues.apache.org/jira/browse/NUTCH-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799037#comment-16799037
]
Markus Jelsma commented on NUTCH-2701:
--
+1
> Fetcher: log dates and times also in human-reada
[
https://issues.apache.org/jira/browse/NUTCH-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797051#comment-16797051
]
Markus Jelsma commented on NUTCH-2703:
--
patch for master
> parse-tika: Boilerpipe should not
[
https://issues.apache.org/jira/browse/NUTCH-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2703:
-
Attachment: NUTCH-2703.patch
> parse-tika: Boilerpipe should not run for non-(X)HTML pa
[
https://issues.apache.org/jira/browse/NUTCH-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16775316#comment-16775316
]
Markus Jelsma commented on NUTCH-2692:
--
Aargh it did! I had a fight with Git, this happened. I'll
[
https://issues.apache.org/jira/browse/NUTCH-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2692.
--
Resolution: Fixed
78af89f2..0085ee74 master -> master
Thanks Sebastian!
> Subcoll
[
https://issues.apache.org/jira/browse/NUTCH-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2692:
-
Attachment: NUTCH-2692.patch
> Subcollection to support case-insensitive white and black li
[
https://issues.apache.org/jira/browse/NUTCH-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16775278#comment-16775278
]
Markus Jelsma commented on NUTCH-2692:
--
That was missing indeed! Thanks Sebastian!
> Subcollect
[
https://issues.apache.org/jira/browse/NUTCH-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16775115#comment-16775115
]
Markus Jelsma commented on NUTCH-2694:
--
Hmm, yes. Why didn't i notice that? Anyway, updated patch
[
https://issues.apache.org/jira/browse/NUTCH-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16775119#comment-16775119
]
Markus Jelsma commented on NUTCH-2694:
--
Committed da8f3f52..33922feb master -> master
Tha
[
https://issues.apache.org/jira/browse/NUTCH-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774986#comment-16774986
]
Markus Jelsma commented on NUTCH-2692:
--
I will commit this one shortly unless objections
[
https://issues.apache.org/jira/browse/NUTCH-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774982#comment-16774982
]
Markus Jelsma commented on NUTCH-2694:
--
I see, i never made a patch, or i lost it.
Anyway, attached
[
https://issues.apache.org/jira/browse/NUTCH-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2694:
-
Attachment: NUTCH-2694.patch
> HostDB to aggregate by long instead of inte
Markus Jelsma created NUTCH-2694:
Summary: HostDB to aggregate by long instead of integer
Key: NUTCH-2694
URL: https://issues.apache.org/jira/browse/NUTCH-2694
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2692:
-
Attachment: NUTCH-2692.patch
> Subcollection to support case-insensitive white and black li
Markus Jelsma created NUTCH-2692:
Summary: Subcollection to support case-insensitive white and black
lists
Key: NUTCH-2692
URL: https://issues.apache.org/jira/browse/NUTCH-2692
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-2689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748783#comment-16748783
]
Markus Jelsma commented on NUTCH-2689:
--
Nice catch! It is always nice to see low hanging fruit like
[
https://issues.apache.org/jira/browse/NUTCH-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746241#comment-16746241
]
Markus Jelsma commented on NUTCH-2678:
--
Great!
remote: To git@github:apache/nutch.git
remote
[
https://issues.apache.org/jira/browse/NUTCH-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2678.
--
Resolution: Fixed
> Allow for per-host configurable protocol plu
[
https://issues.apache.org/jira/browse/NUTCH-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746144#comment-16746144
]
Markus Jelsma commented on NUTCH-2687:
--
Thanks!
> Regex for reading title from Content-Disposit
[
https://issues.apache.org/jira/browse/NUTCH-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746143#comment-16746143
]
Markus Jelsma commented on NUTCH-2678:
--
Alright, so
https://patch-diff.githubusercontent.com/raw
Markus Jelsma created NUTCH-2687:
Summary: Regex for reading title from Content-Disposition is wrong
Key: NUTCH-2687
URL: https://issues.apache.org/jira/browse/NUTCH-2687
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2687:
-
Attachment: NUTCH-2687.patch
> Regex for reading title from Content-Disposition is wr
[
https://issues.apache.org/jira/browse/NUTCH-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2687:
-
Description:
Given URL:
https://www.amuse-project.org/file/download/default
[
https://issues.apache.org/jira/browse/NUTCH-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737087#comment-16737087
]
Markus Jelsma commented on NUTCH-2678:
--
Alright! I added support for protocol:http.., i couldn't
[
https://issues.apache.org/jira/browse/NUTCH-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2678:
-
Attachment: NUTCH-2647.patch
> Allow for per-host configurable protocol plu
[
https://issues.apache.org/jira/browse/NUTCH-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2678:
-
Attachment: (was: NUTCH-2647.patch)
> Allow for per-host configurable protocol plu
[
https://issues.apache.org/jira/browse/NUTCH-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735692#comment-16735692
]
Markus Jelsma commented on NUTCH-2673:
--
Yes, thanks Sebastian!
> EOFException protocol-h
[
https://issues.apache.org/jira/browse/NUTCH-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-2673.
Resolution: Not A Problem
> EOFException protocol-h
[
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730203#comment-16730203
]
Markus Jelsma commented on NUTCH-2665:
--
Thanks!
> Upgrade to Apache Tika 1.1
[
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-2665.
> Upgrade to Apache Tika 1.19.1
> -
>
> Key
[
https://issues.apache.org/jira/browse/NUTCH-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2678:
-
Description:
Introduces new configuration file for mapping protocol plugins to hostnames.
{code
[
https://issues.apache.org/jira/browse/NUTCH-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716854#comment-16716854
]
Markus Jelsma commented on NUTCH-2678:
--
Hello Sebastian!
* it is indeed. I was in a hurry
[
https://issues.apache.org/jira/browse/NUTCH-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716865#comment-16716865
]
Markus Jelsma commented on NUTCH-2678:
--
Updated patch to include configuration file template. I
[
https://issues.apache.org/jira/browse/NUTCH-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2678:
-
Attachment: NUTCH-2678.patch
> Allow for per-host configurable protocol plu
[
https://issues.apache.org/jira/browse/NUTCH-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2678:
-
Attachment: NUTCH-2678.patch
> Allow for per-host configurable protocol plu
[
https://issues.apache.org/jira/browse/NUTCH-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2678:
-
Attachment: NUTCH-2678.patch
> Allow for per-host configurable protocol plu
Markus Jelsma created NUTCH-2678:
Summary: Allow for per-host configurable protocol plugin
Key: NUTCH-2678
URL: https://issues.apache.org/jira/browse/NUTCH-2678
Project: Nutch
Issue Type
Hello,
We need a configurable set of hosts to use a specific protocol plugin. There
are several hacks i can of on how to achieve this. I am asking here to see if
any of you have a good suggestion.
Thanks,
Markus
Hello Lewis!
I would applaud for having a Mavenized build for Nutch! If i remember right,
there was a ticket for this, is it not? I do not seem to be able to find it
right away.
Regards,
Markus
-Original message-
> From:lewis john mcgibbney
> Sent: Thursday 29th November 2018
[
https://issues.apache.org/jira/browse/NUTCH-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688210#comment-16688210
]
Markus Jelsma commented on NUTCH-2675:
--
Well, what we could do is override ParseUtil.parse and add
Markus Jelsma created NUTCH-2673:
Summary: EOFException protocol-http
Key: NUTCH-2673
URL: https://issues.apache.org/jira/browse/NUTCH-2673
Project: Nutch
Issue Type: Bug
Affects
[
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662234#comment-16662234
]
Markus Jelsma commented on NUTCH-2665:
--
On my machine it really fails with the latest patch, weird
[
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661983#comment-16661983
]
Markus Jelsma commented on NUTCH-2665:
--
Helloe [~axr], yes it compiles fine, that is where
[
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660625#comment-16660625
]
Markus Jelsma commented on NUTCH-2665:
--
I'll commit this one later today, if i don't forget, unless
[
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2665:
-
Attachment: NUTCH-2665.patch
> Upgrade to Apache Tika 1.1
201 - 300 of 3217 matches
Mail list logo