[
https://issues.apache.org/jira/browse/NUTCH-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083109#comment-15083109
]
Markus Jelsma commented on NUTCH-1449:
--
We have it nicely running for some years. I will commit this
[
https://issues.apache.org/jira/browse/NUTCH-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083285#comment-15083285
]
Sebastian Nagel commented on NUTCH-2168:
Hi [~kalanya], looks like the indexed raw content of the
[
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083214#comment-15083214
]
Chris A. Mattmann commented on NUTCH-2191:
--
Very nice, Markus! Beat me to implementing this one.
[
https://issues.apache.org/jira/browse/NUTCH-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1321:
-
Patch Info: Patch Available
> IDNNormalizer
> -
>
> Key: NUTCH-1321
>
Markus Jelsma created NUTCH-2196:
Summary: IndexingFilterChecker to optionally normalize
Key: NUTCH-2196
URL: https://issues.apache.org/jira/browse/NUTCH-2196
Project: Nutch
Issue Type:
[
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083273#comment-15083273
]
Markus Jelsma commented on NUTCH-2191:
--
Hey Chris! An Ajax pattern handler is new to me. Can you
Markus Jelsma created NUTCH-2195:
Summary: IndexingFilterChecker to optionally follow N redirects
Key: NUTCH-2195
URL: https://issues.apache.org/jira/browse/NUTCH-2195
Project: Nutch
Issue
[
https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083092#comment-15083092
]
Markus Jelsma commented on NUTCH-2184:
--
Hello Lewis!
* it should be no problem. But since
[
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2191:
-
Patch Info: Patch Available
> Add protocol-htmlunit
> -
>
>
[
https://issues.apache.org/jira/browse/NUTCH-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083101#comment-15083101
]
Markus Jelsma commented on NUTCH-1838:
--
If no objections, i'll get this one in soon
> Host and
Markus Jelsma created NUTCH-2194:
Summary: Run IndexingFilterChecker as simple Telnet server
Key: NUTCH-2194
URL: https://issues.apache.org/jira/browse/NUTCH-2194
Project: Nutch
Issue Type:
[
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083104#comment-15083104
]
Markus Jelsma commented on NUTCH-2191:
--
Does anyone have an idea on how to force the plugin to use
[
https://issues.apache.org/jira/browse/NUTCH-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083114#comment-15083114
]
Markus Jelsma commented on NUTCH-1257:
--
Hmm, there is no patch but i remember having had this support
[
https://issues.apache.org/jira/browse/NUTCH-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2178:
-
Patch Info: Patch Available
> DeduplicationJob to optionall group on host or domain
>
[
https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1932:
-
Patch Info: Patch Available
> Automatically remove orphaned pages
>
[
https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083138#comment-15083138
]
Lewis John McGibbney commented on NUTCH-1186:
-
Will scope and test [~markus17]
>
[
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083603#comment-15083603
]
Sebastian Nagel commented on NUTCH-2191:
As [~haraldk] mentioned in [this
[
https://issues.apache.org/jira/browse/NUTCH-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083096#comment-15083096
]
Markus Jelsma commented on NUTCH-2178:
--
Will commit in a few if no further objections.
>
[
https://issues.apache.org/jira/browse/NUTCH-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1449:
-
Patch Info: Patch Available
> Optionally delete documents skipped by IndexingFilters
>
[
https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1186:
-
Patch Info: Patch Available
> FreeGenerator always normalizes
> ---
>
[
https://issues.apache.org/jira/browse/NUTCH-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085043#comment-15085043
]
liuqibj commented on NUTCH-2143:
I have a fix and can deliver it
> GeneratorJob ignores batch id passed
[
https://issues.apache.org/jira/browse/NUTCH-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082904#comment-15082904
]
Auro Miralles commented on NUTCH-2168:
--
Hello. I have no idea which document fails... I can crawl
[
https://issues.apache.org/jira/browse/NUTCH-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082904#comment-15082904
]
Auro Miralles edited comment on NUTCH-2168 at 1/5/16 11:50 AM:
---
Hello. I
[
https://issues.apache.org/jira/browse/NUTCH-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082951#comment-15082951
]
ASF GitHub Bot commented on NUTCH-1946:
---
Github user lewismc commented on the pull request:
[
https://issues.apache.org/jira/browse/NUTCH-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082966#comment-15082966
]
ASF GitHub Bot commented on NUTCH-1946:
---
Github user jeroenvlek closed the pull request at:
25 matches
Mail list logo