Build failed in Jenkins: Nutch-trunk #1583

2011-08-23 Thread Apache Jenkins Server
See -- [...truncated 986 lines...] A src/plugin/subcollection/src/java/org/apache/nutch/collection/CollectionManager.java A src/plugin/subcollection/src/java/org/apache/nutch/collection/pack

[jira] [Commented] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

2011-08-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089761#comment-13089761 ] Markus Jelsma commented on NUTCH-1024: -- I'd like to commit this issue this friday unl

[jira] [Commented] (NUTCH-1057) Make fetcher thread time out configurable

2011-08-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089762#comment-13089762 ] Markus Jelsma commented on NUTCH-1057: -- I'd like to commit this issue this friday unl

[jira] [Resolved] (NUTCH-1089) short compressed pages caused Exception

2011-08-23 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1089. -- Resolution: Fixed 1.4 Committed revision 1160753. trunk Committed revision 1160754 Thanks Simo

[jira] [Resolved] (NUTCH-1085) Nutch script does not require HADOOP_HOME

2011-08-23 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1085. -- Resolution: Fixed Trunk : Committed revision 1160738 1.4 : Committed revision 1160734 > Nutch

[jira] [Updated] (NUTCH-1089) short compressed pages caused Exception

2011-08-23 Thread simone frenzel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] simone frenzel updated NUTCH-1089: -- Attachment: HttpResponsePatch.patch > short compressed pages caused Exception > -

[jira] [Created] (NUTCH-1089) short compressed pages caused Exception

2011-08-23 Thread simone frenzel (JIRA)
short compressed pages caused Exception - Key: NUTCH-1089 URL: https://issues.apache.org/jira/browse/NUTCH-1089 Project: Nutch Issue Type: Bug Reporter: simone frenzel Hi, tested nutch

[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2011-08-23 Thread Aravind Srini (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089503#comment-13089503 ] Aravind Srini commented on NUTCH-1086: -- Thanks, Oleg for pitching in and confirming t

[jira] [Commented] (NUTCH-1085) Nutch script does not require HADOOP_HOME

2011-08-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089496#comment-13089496 ] Lewis John McGibbney commented on NUTCH-1085: - As well as being nice for peopl

[Nutch Wiki] Trivial Update of "bin/nutch_generate" by LewisJohnMcgibbney

2011-08-23 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "bin/nutch_generate" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/bin/nutch_generate?action=diff&rev1=12&rev2=13 '''[-topN N]''': Where N is the num

Re: Patch für httpResponse

2011-08-23 Thread Julien Nioche
Simone, Would you mind opening a JIRA for this and attach your patch + grant it to ASF? I know it is fairly small but it makes it easier to track the progress, link to svn commits, etc... Thanks Julien On 23 August 2011 07:53, Simone Frenzel wrote: > > > -- Forwarded message -

[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2011-08-23 Thread Oleg Kalnichevski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089466#comment-13089466 ] Oleg Kalnichevski commented on NUTCH-1086: -- The 4.1.3 release of HttpCore patched

Re: The crawl command, keep or get rid of

2011-08-23 Thread Eric Pugh
I wonder if the name "crawl" implies that the command is sort of standard command, and all you would need? After all, if I where to sit down with a "crawler", it seems very logical that "crawl" would be how you run it! I like the simplicity of crawl from a "getting started" approach. I agree

Re: The crawl command, keep or get rid of

2011-08-23 Thread Radim Kolar
I agree. Nuke crawl command

[jira] [Created] (NUTCH-1088) Write Solr XML documents

2011-08-23 Thread Markus Jelsma (JIRA)
Write Solr XML documents Key: NUTCH-1088 URL: https://issues.apache.org/jira/browse/NUTCH-1088 Project: Nutch Issue Type: New Feature Components: indexer Reporter: Markus Jelsma Fix

[jira] [Commented] (NUTCH-1087) Deprecate crawl command and replace with example script

2011-08-23 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089405#comment-13089405 ] Andrzej Bialecki commented on NUTCH-1087: -- IIRC we had this discussion in the p

Re: The crawl command, keep or get rid of

2011-08-23 Thread Markus Jelsma
You're right: https://issues.apache.org/jira/browse/NUTCH-1087 On Tuesday 23 August 2011 13:24:27 Julien Nioche wrote: > > What kind of shell script did you have in mind? The wiki already provides > > some > > useful scripts. The tutorials on Nutch also show commands that can be > > used in > > cu

[jira] [Created] (NUTCH-1087) Deprecate crawl command and replace with example script

2011-08-23 Thread Markus Jelsma (JIRA)
Deprecate crawl command and replace with example script --- Key: NUTCH-1087 URL: https://issues.apache.org/jira/browse/NUTCH-1087 Project: Nutch Issue Type: Task Affects Versions: 1.4

Re: The crawl command, keep or get rid of

2011-08-23 Thread Julien Nioche
> What kind of shell script did you have in mind? The wiki already provides > some > useful scripts. The tutorials on Nutch also show commands that can be used > in > custom scripts. > That's exactly my point. There are various scripts in the wiki, based on different versions of Nutch and of varia

[jira] [Commented] (NUTCH-578) URL fetched with 403 is generated over and over again

2011-08-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089382#comment-13089382 ] Markus Jelsma commented on NUTCH-578: - I just confirmed this is still an issue. I've ch

Re: The crawl command, keep or get rid of

2011-08-23 Thread Markus Jelsma
What kind of shell script did you have in mind? The wiki already provides some useful scripts. The tutorials on Nutch also show commands that can be used in custom scripts. Is an immediate crawl-with-one-command a desired feature? Provided as Java code or shell script? On Tuesday 23 August 201

[jira] [Assigned] (NUTCH-578) URL fetched with 403 is generated over and over again

2011-08-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-578: --- Assignee: Markus Jelsma (was: Dennis Kubes) > URL fetched with 403 is generated over and over

Re: Rewrite protocol-httpclient

2011-08-23 Thread Markus Jelsma
In branch 1.4 at first. It should be easy to port to trunk however. You're more than welcome to contribute. > On Tue, Aug 23, 2011 at 12:28 AM, Markus Jelsma > > wrote: > > Hi, > > > > Please see Julien's comment in this recent thread: > > Re: Future of Nutch 2.0 [Was: Unresolved dependencies >

Re: The crawl command, keep or get rid of

2011-08-23 Thread Julien Nioche
+1 let's replace it with a shell script instead. On 22 August 2011 21:56, Markus Jelsma wrote: > Hi, > > The crawl command seems to add a lot of confusion. It hides the entire > crawl > cycle logic from new users, leading to questions, lack of understanding of > basic Nutch concepts, unsupported

[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2011-08-23 Thread Aravind Srini (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089314#comment-13089314 ] Aravind Srini commented on NUTCH-1086: -- Some transitive dependencies: * Solr 3.1.0 ,