[
https://issues.apache.org/jira/browse/NUTCH-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2215:
-
Attachment: NUTCH-2215.patch
Patch for trunk. Unit test passes!
> Generator to restrict cr
Markus Jelsma created NUTCH-2215:
Summary: Generator to restrict crawl to mime type
Key: NUTCH-2215
URL: https://issues.apache.org/jira/browse/NUTCH-2215
Project: Nutch
Issue Type
Markus Jelsma created NUTCH-2214:
Summary: Index clean to be flexible on what it deletes
Key: NUTCH-2214
URL: https://issues.apache.org/jira/browse/NUTCH-2214
Project: Nutch
Issue Type
[
https://issues.apache.org/jira/browse/NUTCH-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2197.
--
Resolution: Fixed
Committed to trunk in revision 1728313. Thanks Jurian Broertjes!
>
[
https://issues.apache.org/jira/browse/NUTCH-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2197:
-
Fix Version/s: 1.12
> Add solr5 solrcloud indexer supp
[
https://issues.apache.org/jira/browse/NUTCH-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2197:
-
Affects Version/s: (was: 1.12)
1.11
> Add solr5 solrcloud inde
[
https://issues.apache.org/jira/browse/NUTCH-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2197:
-
Attachment: NUTCH-2197.patch
Previous patch was missing a proper version in plugin.xml
Markus Jelsma created NUTCH-2212:
Summary: Decrease memory consumption by tuning stack size
Key: NUTCH-2212
URL: https://issues.apache.org/jira/browse/NUTCH-2212
Project: Nutch
Issue Type
[
https://issues.apache.org/jira/browse/NUTCH-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-2211.
> Filter and normalizer checkers missing in bin/nu
Markus Jelsma created NUTCH-2211:
Summary: Filter and normalizer checkers missing in bin/nutch
Key: NUTCH-2211
URL: https://issues.apache.org/jira/browse/NUTCH-2211
Project: Nutch
Issue Type
[
https://issues.apache.org/jira/browse/NUTCH-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2211.
--
Resolution: Fixed
Committed to trunk in revision 1728339.
> Filter and normalizer check
[
https://issues.apache.org/jira/browse/NUTCH-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2211:
-
Attachment: NUTCH-2211.patch
Patch for trunk.
> Filter and normalizer checkers missing in
[
https://issues.apache.org/jira/browse/NUTCH-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2197:
-
Attachment: NUTCH-2197.patch
Here's the updated patch with Solr 5.4.1
> Add solr5 solrcl
Markus Jelsma created NUTCH-2210:
Summary: Upgrade to Tika 1.12
Key: NUTCH-2210
URL: https://issues.apache.org/jira/browse/NUTCH-2210
Project: Nutch
Issue Type: Task
Reporter
[
https://issues.apache.org/jira/browse/NUTCH-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128366#comment-15128366
]
Markus Jelsma commented on NUTCH-2197:
--
I am going to commit this soon unless objections.
>
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117024#comment-15117024
]
Markus Jelsma commented on NUTCH-961:
-
Yes! :)
> Expose Tika's boilerpipe supp
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116975#comment-15116975
]
Markus Jelsma commented on NUTCH-961:
-
With boilerpipe, you get only a very few outlinks, those found
[
https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1465:
-
Fix Version/s: 1.13
> Support sitemaps in Nu
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114989#comment-15114989
]
Markus Jelsma commented on NUTCH-961:
-
That is probably due to the patch parsing twice. Once with BP
[
https://issues.apache.org/jira/browse/NUTCH-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114991#comment-15114991
]
Markus Jelsma commented on NUTCH-2205:
--
This looks like your cluster was down, not a Nutch error
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1325:
-
Patch Info: Patch Available
Description:
h1. HostDB for Apache Nutch 1.x
* automatically
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1325:
-
Attachment: NUTCH-1325.patch
Updated patch for trunk contains more thorough config descriptions
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1325:
-
Attachment: NUTCH-1325.patch
Updated patch to use TDigest for streaming percentiles. But because
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1325:
-
Attachment: NUTCH-1325.patch
TDigest is awesome! Here's with support for user configurable list
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1325:
-
Fix Version/s: 1.12
> HostDB for Nutch
>
>
> Key
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-1325.
--
Resolution: Fixed
Committed to trunk in revision 1725952. Many thanks to all contributors
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1325:
-
Component/s: hostdb
> HostDB for Nutch
>
>
> Key
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110375#comment-15110375
]
Markus Jelsma commented on NUTCH-1233:
--
Yes, we'll get this support with Tika 1.12. Timothy Allison
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110373#comment-15110373
]
Markus Jelsma commented on NUTCH-961:
-
Hello - that doesn't seem related to this issue as it doesn't
[
https://issues.apache.org/jira/browse/NUTCH-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2201:
-
Attachment: NUTCH-2201.patch
Patch for trunk which removed the loops program and all references
[
https://issues.apache.org/jira/browse/NUTCH-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2201:
-
Patch Info: Patch Available
> Remove loops program from webgraph pack
[
https://issues.apache.org/jira/browse/NUTCH-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110797#comment-15110797
]
Markus Jelsma commented on NUTCH-2197:
--
This Solr 5 plugin is capable of indexing to Solr 5 in cloud
[
https://issues.apache.org/jira/browse/NUTCH-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2201.
--
Resolution: Fixed
Committed to trunk revision 1725981. Thanks Dennis!
> Remove loops prog
[
https://issues.apache.org/jira/browse/NUTCH-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110947#comment-15110947
]
Markus Jelsma commented on NUTCH-2202:
--
Yes, a patch would be a good place to start. I've read
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15111292#comment-15111292
]
Markus Jelsma commented on NUTCH-961:
-
Some news, the upstream Tika issue has been committed
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106570#comment-15106570
]
Markus Jelsma commented on NUTCH-961:
-
Yes but it requires NUTCH-1233.
> Expose Tika's boilerp
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1233:
-
Attachment: NUTCH-1233.patch
Updated patch for trunk
> Rely on Tika for outlink extract
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1233:
-
Attachment: pre-1233.txt
post-1233.txt
Two lists of extracted URL's, before
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1233:
-
Attachment: NUTCH-1233.patch
Updated patch. Patch now contains the old link extraction commented
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1233:
-
Attachment: pre-1233-2.txt
post-1233-2.txt
Here's another set to compare
> R
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106633#comment-15106633
]
Markus Jelsma edited comment on NUTCH-1233 at 1/19/16 11:57 AM:
It seems
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106633#comment-15106633
]
Markus Jelsma commented on NUTCH-1233:
--
It seems Tika's link extraction does not cover and elements
[
https://issues.apache.org/jira/browse/NUTCH-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2201:
-
Affects Version/s: 1.11
> Remove loops program from webgraph pack
[
https://issues.apache.org/jira/browse/NUTCH-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2201:
-
Fix Version/s: 1.12
> Remove loops program from webgraph pack
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106783#comment-15106783
]
Markus Jelsma commented on NUTCH-961:
-
Update, i've updated NUTCH-1233 for current trunk as well
[
https://issues.apache.org/jira/browse/NUTCH-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2203:
-
Fix Version/s: 1.12
> Suffix URL filter can't handle trailing/leading whitespa
[
https://issues.apache.org/jira/browse/NUTCH-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2203.
--
Resolution: Fixed
Committed to trunk in revision 1725538. Thanks Jurian Broertjes.
> Suf
[
https://issues.apache.org/jira/browse/NUTCH-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reassigned NUTCH-2203:
Assignee: Markus Jelsma
> Suffix URL filter can't handle trailing/leading whitespa
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reassigned NUTCH-1325:
Assignee: Markus Jelsma
> HostDB for Nutch
>
>
>
ommand then please let me know.
thanks
On Mon, Jan 18, 2016 at 4:50 PM, Markus Jelsma <markus.jel...@openindex.io
<mailto:markus.jel...@openindex.io>> wrote:
Hi - This doesnt look like a HTTP basic authentication problem. Are you running
Solr 5.x?
Markus
-Original message-
Hi - can you post the log output?
Markus
-Original message-
From: Zara Parst
Sent: Monday 18th January 2016 2:06
To: dev@nutch.apache.org
Subject: Nutch/Solr communication problem
Hi everyone,
I have situation here, I am using nutch 1.11 and solr 5.4
Solr is
Job.java:145)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)
On Mon, Jan 18, 2016 at 4:15 PM, Markus Jelsma <markus.jel...@openindex.io
<mailto:markus.jel...@op
[
https://issues.apache.org/jira/browse/NUTCH-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-1107.
Resolution: Won't Fix
> Log slow parse entries
> --
>
>
Markus Jelsma created NUTCH-2201:
Summary: Remove loops program from webgrapg package
Key: NUTCH-2201
URL: https://issues.apache.org/jira/browse/NUTCH-2201
Project: Nutch
Issue Type: Task
[
https://issues.apache.org/jira/browse/NUTCH-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-1838.
--
Resolution: Fixed
> Host and domain based regex and automaton filter
[
https://issues.apache.org/jira/browse/NUTCH-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reassigned NUTCH-2197:
Assignee: Markus Jelsma
> Add solr5 solrcloud indexer supp
[
https://issues.apache.org/jira/browse/NUTCH-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reassigned NUTCH-2201:
Assignee: Markus Jelsma
> Remove loops program from webgraph pack
016 16:16
To: dev@nutch.apache.org
Subject: Re: Nutch/Solr communication problem
Mind to share that patch ?
On Mon, Jan 18, 2016 at 8:28 PM, Markus Jelsma <markus.jel...@openindex.io
<mailto:markus.jel...@openindex.io>> wrote:
Yes i have used it, i made the damn patch myself yea
[
https://issues.apache.org/jira/browse/NUTCH-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2201:
-
Summary: Remove loops program from webgraph package (was: Remove loops
program from webgrapg
[
https://issues.apache.org/jira/browse/NUTCH-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-1149.
Resolution: Won't Fix
Will upload proper patch for NUTCH-1325 soon which already contains numeric
[
https://issues.apache.org/jira/browse/NUTCH-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2194.
--
Resolution: Fixed
Committed to trunk in revision 1724771.
> Run IndexingFilterChec
[
https://issues.apache.org/jira/browse/NUTCH-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2194:
-
Attachment: NUTCH-2194.patch
Updated patch. Signature is now also added to CrawlDatum, in case
[
https://issues.apache.org/jira/browse/NUTCH-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2195:
-
Priority: Trivial (was: Major)
> IndexingFilterChecker to optionally follow N redire
[
https://issues.apache.org/jira/browse/NUTCH-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2195:
-
Attachment: NUTCH-2195.patch
Patch for trunk. -followRedirects now follow redirects a few times
[
https://issues.apache.org/jira/browse/NUTCH-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2194:
-
Priority: Minor (was: Major)
> Run IndexingFilterChecker as simple Telnet ser
[
https://issues.apache.org/jira/browse/NUTCH-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2196:
-
Priority: Trivial (was: Major)
> IndexingFilterChecker to optionally normal
[
https://issues.apache.org/jira/browse/NUTCH-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2195.
--
Resolution: Fixed
Assignee: Markus Jelsma
Committed to trunk in revision 1724409
[
https://issues.apache.org/jira/browse/NUTCH-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2195:
-
Patch Info: Patch Available
> IndexingFilterChecker to optionally follow N redire
[
https://issues.apache.org/jira/browse/NUTCH-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15096156#comment-15096156
]
Markus Jelsma commented on NUTCH-2196:
--
Committed to trunk in revision 1724418
[
https://issues.apache.org/jira/browse/NUTCH-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2196.
--
Resolution: Fixed
> IndexingFilterChecker to optionally normal
[
https://issues.apache.org/jira/browse/NUTCH-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2196:
-
Assignee: Markus Jelsma
Patch Info: Patch Available
> IndexingFilterChecker to optiona
[
https://issues.apache.org/jira/browse/NUTCH-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2196:
-
Attachment: NUTCH-2196.patch
Patch for trunk introducing the -normalize flag. If enabled, input
[
https://issues.apache.org/jira/browse/NUTCH-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2194:
-
Patch Info: Patch Available
> Run IndexingFilterChecker as simple Telnet ser
[
https://issues.apache.org/jira/browse/NUTCH-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15096263#comment-15096263
]
Markus Jelsma commented on NUTCH-2194:
--
Please check it out :)
> Run IndexingFilterChecker as sim
[
https://issues.apache.org/jira/browse/NUTCH-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2194:
-
Description:
We have used a customized IndexingFilterChecker running as server to be able
[
https://issues.apache.org/jira/browse/NUTCH-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2194:
-
Attachment: NUTCH-2194.patch
Patch for trunk. With default settings this server needs just about
[
https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093697#comment-15093697
]
Markus Jelsma commented on NUTCH-1712:
--
Nice!
> Use MultipleInputs in Injector to make it a sin
[
https://issues.apache.org/jira/browse/NUTCH-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reopened NUTCH-2190:
--
Need to add the example config file.
> Protocol normali
[
https://issues.apache.org/jira/browse/NUTCH-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2190.
--
Resolution: Fixed
Committed revision 1724199.
> Protocol normali
[
https://issues.apache.org/jira/browse/NUTCH-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2190.
--
Resolution: Fixed
Assignee: Markus Jelsma
Committed revision 1724085.
> Proto
[
https://issues.apache.org/jira/browse/NUTCH-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2190:
-
Attachment: NUTCH-2190.patch
Final patch including all entries for build.xml
[
https://issues.apache.org/jira/browse/NUTCH-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-1449.
--
Resolution: Fixed
Committed revision 1723688.
> Optionally delete documents skip
[
https://issues.apache.org/jira/browse/NUTCH-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2178:
-
Summary: DeduplicationJob to optionally group on host or domain (was:
DeduplicationJob
[
https://issues.apache.org/jira/browse/NUTCH-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2178.
--
Resolution: Fixed
Committed to trunk in revision 1723690.
> DeduplicationJob to optiona
[
https://issues.apache.org/jira/browse/NUTCH-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089073#comment-15089073
]
Markus Jelsma edited comment on NUTCH-1449 at 1/8/16 11:16 AM:
---
Committed
[
https://issues.apache.org/jira/browse/NUTCH-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089081#comment-15089081
]
Markus Jelsma commented on NUTCH-2190:
--
I'll also get this one in soon unless objections of course
[
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089297#comment-15089297
]
Markus Jelsma commented on NUTCH-2191:
--
Hi - i've 'read' that discussion that couple of weeks ago
[
https://issues.apache.org/jira/browse/NUTCH-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089121#comment-15089121
]
Markus Jelsma commented on NUTCH-1838:
--
Committed to trunk in revision 1723710.
> Host and dom
[
https://issues.apache.org/jira/browse/NUTCH-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083109#comment-15083109
]
Markus Jelsma commented on NUTCH-1449:
--
We have it nicely running for some years. I will commit
[
https://issues.apache.org/jira/browse/NUTCH-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1321:
-
Patch Info: Patch Available
> IDNNormalizer
> -
>
> Key
Markus Jelsma created NUTCH-2196:
Summary: IndexingFilterChecker to optionally normalize
Key: NUTCH-2196
URL: https://issues.apache.org/jira/browse/NUTCH-2196
Project: Nutch
Issue Type
[
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083273#comment-15083273
]
Markus Jelsma commented on NUTCH-2191:
--
Hey Chris! An Ajax pattern handler is new to me. Can you
Markus Jelsma created NUTCH-2195:
Summary: IndexingFilterChecker to optionally follow N redirects
Key: NUTCH-2195
URL: https://issues.apache.org/jira/browse/NUTCH-2195
Project: Nutch
Issue
[
https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083092#comment-15083092
]
Markus Jelsma commented on NUTCH-2184:
--
Hello Lewis!
* it should be no problem. But since
[
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2191:
-
Patch Info: Patch Available
> Add protocol-htmlu
[
https://issues.apache.org/jira/browse/NUTCH-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083101#comment-15083101
]
Markus Jelsma commented on NUTCH-1838:
--
If no objections, i'll get this one in soon
> H
Markus Jelsma created NUTCH-2194:
Summary: Run IndexingFilterChecker as simple Telnet server
Key: NUTCH-2194
URL: https://issues.apache.org/jira/browse/NUTCH-2194
Project: Nutch
Issue Type
[
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083104#comment-15083104
]
Markus Jelsma commented on NUTCH-2191:
--
Does anyone have an idea on how to force the plugin to use
[
https://issues.apache.org/jira/browse/NUTCH-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083114#comment-15083114
]
Markus Jelsma commented on NUTCH-1257:
--
Hmm, there is no patch but i remember having had this support
[
https://issues.apache.org/jira/browse/NUTCH-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2178:
-
Patch Info: Patch Available
> DeduplicationJob to optionall group on host or dom
801 - 900 of 3217 matches
Mail list logo