org.mortbay.proxy package not found in nutch 1.x, Ref Class - ProxyTestbed

2015-02-10 Thread Preetam Pradeepkumar Shingavi
Hi, I am trying to configure Nutch 1.X on eclipse, and configured the build path to include all jars from the build-lib folder. There is a class ProxyTestbed.java which has a error in importing the following package : import *org.mortbay.proxy.*AsyncProxyServlet; (proxy package not found) I

[jira] [Commented] (NUTCH-1939) Fetcher fails to follow redirects

2015-02-10 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315374#comment-14315374 ] lufeng commented on NUTCH-1939: --- Hi Sebastian One question. How do you use the FetchItem

[jira] [Comment Edited] (NUTCH-1939) Fetcher fails to follow redirects

2015-02-10 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315374#comment-14315374 ] lufeng edited comment on NUTCH-1939 at 2/11/15 2:16 AM: I think

[jira] [Created] (NUTCH-1940) Port HTTP POST Authentication to 2.X

2015-02-10 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1940: --- Summary: Port HTTP POST Authentication to 2.X Key: NUTCH-1940 URL: https://issues.apache.org/jira/browse/NUTCH-1940 Project: Nutch Issue Type:

[jira] [Updated] (NUTCH-827) HTTP POST Authentication

2015-02-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-827: --- Fix Version/s: (was: 2.4) HTTP POST Authentication

[jira] [Updated] (NUTCH-1940) Port HTTP POST Authentication to 2.X

2015-02-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1940: Issue Type: New Feature (was: Bug) Port HTTP POST Authentication to 2.X

Re: Reverse Geocoding for the Masses - Apache Nutch Guest Post - Revised - STF - Invitation to comment

2015-02-10 Thread Lewis John Mcgibbney
Fantastic Susan. I'll look forward to your feedback. Thank you for notice. Have a great day Lewis On Tue, Feb 10, 2015 at 7:25 AM, Susan Fendrock sfendr...@maxmind.com wrote: Hi Lewis, I went through and accepted all your suggested changes. As it stands, your blog post is about twice our

[jira] [Commented] (NUTCH-1323) AjaxNormalizer

2015-02-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314461#comment-14314461 ] Lewis John McGibbney commented on NUTCH-1323: - [~markus17] I am +1 on this, if

Re: 572:Crawl statistics for each repository ?

2015-02-10 Thread feng lu
Hi Jaydeep you can following command to get statistics for each host when using one database to crawl multiple repository. bin/nutch readdb crawldb/crawldb/ -stats -sort On Mon, Feb 9, 2015 at 12:01 PM, Jaydeep Bagrecha bagre...@usc.edu wrote: Thanks. *P.S* The question was:- *Given M

[jira] [Created] (NUTCH-1941) Optional rolling http.agent.name's

2015-02-10 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1941: --- Summary: Optional rolling http.agent.name's Key: NUTCH-1941 URL: https://issues.apache.org/jira/browse/NUTCH-1941 Project: Nutch Issue Type:

Nutch-Selenium in Nutch 1.10

2015-02-10 Thread Shuo Li
Yop, I'm trying to install selenium in Nutch 1.10. However, this error pops out: *error: package org.apache.nutch.storage does not exist* I can only find this package in Nutch 2.x. Is there a way to use Selenium in 1.10? Any advice would be appreciated. Regards, Shuo Li

Re: Nutch-Selenium in Nutch 1.10

2015-02-10 Thread Sapnashri Suresh
Hi Shuo Li, We were facing a similar issue. Prof. Mattman suggested we look into this patch for Selenium on Nutch 1.10 : https://issues.apache.org/jira/browse/NUTCH-1933. Hope this helps! Thanks, Sapna On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li sli...@usc.edu wrote: Yop, I'm trying to

Re: Nutch-Selenium in Nutch 1.10

2015-02-10 Thread Mattmann, Chris A (3980)
Perfect, that’s what I suggested, thanks guys! ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop:

[jira] [Commented] (NUTCH-1941) Optional rolling http.agent.name's

2015-02-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315647#comment-14315647 ] Lewis John McGibbney commented on NUTCH-1941: - Perfect example of where this

[jira] [Assigned] (NUTCH-1735) code dedup fetcher queue redirects

2015-02-10 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-1735: -- Assignee: Sebastian Nagel code dedup fetcher queue redirects

[jira] [Created] (NUTCH-1939) Fetcher fails to follow redirects

2015-02-10 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-1939: -- Summary: Fetcher fails to follow redirects Key: NUTCH-1939 URL: https://issues.apache.org/jira/browse/NUTCH-1939 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-1939) Fetcher fails to follow redirects

2015-02-10 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1939: --- Attachment: NUTCH-1939.patch Patch was tested: redirects are followed by Fetcher if

Why the protocol-httpclient Does Handle URL with Special Characters

2015-02-10 Thread Renxia Wang
Hi, I used the protocol-httpclient to deal with https and I noticed that it does not handle the special characters, like spaces, [, ], | automatically, while the protocol-http does. Is there a reason why this plugin doesn't support this feature? Any improvement can be made to it? Thanks, Zhique

[Nutch Wiki] Trivial Update of ContributorsGroup by LewisJohnMcgibbney

2015-02-10 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The ContributorsGroup page has been changed by LewisJohnMcgibbney: https://wiki.apache.org/nutch/ContributorsGroup?action=diffrev1=19rev2=20 * WayneBurke * MichaelJoyce *

[Nutch Wiki] Update of SujenShah by SujenShah

2015-02-10 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The SujenShah page has been changed by SujenShah: https://wiki.apache.org/nutch/SujenShah New page: ##language:en == Sujen Shah == Email: MailTo(sujen1412 AT SPAMFREE gmail DOT com) ...

[Nutch Wiki] Update of FrontPage by LewisJohnMcgibbney

2015-02-10 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The FrontPage page has been changed by LewisJohnMcgibbney: https://wiki.apache.org/nutch/FrontPage?action=diffrev1=294rev2=295 Please contribute your knowledge about Nutch here!

RE: Why the protocol-httpclient Does Handle URL with Special Characters

2015-02-10 Thread Markus Jelsma
Indeed! You need to make some improvements to the basic URL normalizer or any custom normalizer so it will properly encode URL's. It will not do it for you. This is still an open issue. -Original message- From: Renxia Wangrenxi...@usc.edu Sent: Wednesday 11th February 2015 0:00 To: