Re: upgrading protocol-httpclient to httpclient 4.1.1

2014-04-04 Thread d_k
Alright. I'll look into it. Thanks! On Sat, Apr 5, 2014 at 12:39 AM, Sebastian Nagel wrote: > > Define 'addressing'. :-) > > I didn't refactor because I don't really know which direction will be the > > right direction for that plugin. So in a way the plugin is still the > same. > > All I did w

[jira] [Comment Edited] (NUTCH-1746) OutOfMemoryError in Mappers

2014-04-04 Thread Greg Padiasek (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960493#comment-13960493 ] Greg Padiasek edited comment on NUTCH-1746 at 4/5/14 3:47 AM: --

[jira] [Issue Comment Deleted] (NUTCH-1746) OutOfMemoryError in Mappers

2014-04-04 Thread Greg Padiasek (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Padiasek updated NUTCH-1746: - Comment: was deleted (was: I use domain-urlfilter.txt (in the attached and split file). I also tr

[jira] [Comment Edited] (NUTCH-1746) OutOfMemoryError in Mappers

2014-04-04 Thread Greg Padiasek (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960493#comment-13960493 ] Greg Padiasek edited comment on NUTCH-1746 at 4/5/14 3:46 AM: --

[jira] [Issue Comment Deleted] (NUTCH-1746) OutOfMemoryError in Mappers

2014-04-04 Thread Greg Padiasek (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Padiasek updated NUTCH-1746: - Comment: was deleted (was: bq. I wonder also if you have tried this with Automaton urlfilter? No

[jira] [Commented] (NUTCH-1746) OutOfMemoryError in Mappers

2014-04-04 Thread Greg Padiasek (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960947#comment-13960947 ] Greg Padiasek commented on NUTCH-1746: -- bq. I wonder also if you have tried this with

[jira] [Commented] (NUTCH-1746) OutOfMemoryError in Mappers

2014-04-04 Thread Greg Padiasek (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960495#comment-13960495 ] Greg Padiasek commented on NUTCH-1746: -- I use domain-urlfilter.txt (in the attached a

[jira] [Updated] (NUTCH-1746) OutOfMemoryError in Mappers

2014-04-04 Thread Greg Padiasek (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Padiasek updated NUTCH-1746: - Attachment: domain-urlfilter-ac domain-urlfilter-ab domain-urlfilt

Re: upgrading protocol-httpclient to httpclient 4.1.1

2014-04-04 Thread Sebastian Nagel
> Define 'addressing'. :-) > I didn't refactor because I don't really know which direction will be the > right direction for that plugin. So in a way the plugin is still the same. > All I did was to change all the API calls to httpclient 4.1.1 and check > that the tests still run (it wasn't as easy

Re: upgrading protocol-httpclient to httpclient 4.1.1

2014-04-04 Thread d_k
On Fri, Apr 4, 2014 at 11:28 PM, Sebastian Nagel wrote: > Hi, > > does it mean you are (also) addressing NUTCH-1086? Would be great, > since this issue is waiting for a solution since long! > Define 'addressing'. :-) I didn't refactor because I don't really know which direction will be the right

Re: Url validator rejected url because of 2 dots

2014-04-04 Thread Sebastian Nagel
Hi, > Url validator plugin reject this kind of url because of .. . > I had a look RFC 2396 and w3c standarts. There is no constraint > about .. except these /../ and /.. kind of statements. Also Unix systems accept files containing two dots "abc..xyz.txt". urlfilter-validator should be relaxed t

Re: upgrading protocol-httpclient to httpclient 4.1.1

2014-04-04 Thread Sebastian Nagel
Hi, does it mean you are (also) addressing NUTCH-1086? Would be great, since this issue is waiting for a solution since long! > The reason I picked version 4.1.1 and not the latest is because I noticed > it is already in the build/lib dir and I wasn't sure I can use two versions > of the jar with

[jira] [Commented] (NUTCH-1746) OutOfMemoryError in Mappers

2014-04-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960327#comment-13960327 ] Lewis John McGibbney commented on NUTCH-1746: - bq. Does it matter when running

[jira] [Commented] (NUTCH-1745) Upgrade to ElasticSearch 1.1.0

2014-04-04 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960074#comment-13960074 ] Hudson commented on NUTCH-1745: --- SUCCESS: Integrated in Nutch-trunk #2589 (See [https://bui

[jira] [Resolved] (NUTCH-1745) Upgrade to ElasticSearch 1.1.0

2014-04-04 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1745. -- Resolution: Fixed Fix Version/s: 1.9 Trunk => Committed revision 1584722. Thanks for re

upgrading protocol-httpclient to httpclient 4.1.1

2014-04-04 Thread d_k
I've written a patch for the 2.2.1 source code that upgrades the protocol-httpclient to httpclient 4.1.1 Unfortunately I had to adjust the test because currently httpclient 4.1.1 does not support authenticating with different credentials against different realms in the same domain: HTTPCLIENT-1490

[jira] [Comment Edited] (NUTCH-1746) OutOfMemoryError in Mappers

2014-04-04 Thread Greg Padiasek (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959957#comment-13959957 ] Greg Padiasek edited comment on NUTCH-1746 at 4/4/14 1:44 PM: --

[jira] [Comment Edited] (NUTCH-1746) OutOfMemoryError in Mappers

2014-04-04 Thread Greg Padiasek (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959957#comment-13959957 ] Greg Padiasek edited comment on NUTCH-1746 at 4/4/14 1:42 PM: --

[jira] [Comment Edited] (NUTCH-1746) OutOfMemoryError in Mappers

2014-04-04 Thread Greg Padiasek (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959957#comment-13959957 ] Greg Padiasek edited comment on NUTCH-1746 at 4/4/14 1:36 PM: --

[jira] [Commented] (NUTCH-1746) OutOfMemoryError in Mappers

2014-04-04 Thread Greg Padiasek (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959957#comment-13959957 ] Greg Padiasek commented on NUTCH-1746: -- No, this should be a default value. > OutOfM

[jira] [Commented] (NUTCH-1746) OutOfMemoryError in Mappers

2014-04-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959929#comment-13959929 ] Lewis John McGibbney commented on NUTCH-1746: - I am keen to learn of the volum

Url validator rejected url because of 2 dots

2014-04-04 Thread Mustafa Sertac Turkel
hi all, I have a seedlist file. The file includes a url something like this: http://www.example.com/example-example..-16067h.htm Url validator plugin reject this kind of url because of .. .I had a look RFC 2396 and w3c standarts. There is no constraint about .. except these /../ and /.. kind

[jira] [Updated] (NUTCH-1741) Support of Sitemaps in Nutch 2.x

2014-04-04 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alparslan Avcı updated NUTCH-1741: -- Attachment: NUTCH-1741.patch I have uploaded a patch that implements the sitemap support as sim

[jira] [Commented] (NUTCH-1486) Upgrade to the latest Solr 4.x

2014-04-04 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959874#comment-13959874 ] Markus Jelsma commented on NUTCH-1486: -- Also, this patch results in lots of surplus j

[jira] [Updated] (NUTCH-1486) Upgrade to the latest Solr 4.x

2014-04-04 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1486: - Attachment: NUTCH-1486-1.9-trunk.patch Patch for trunk! Cannot upgrade to HTTPClient 4.3.1 or 4.3