[jira] [Commented] (NUTCH-1392) -force and -resume arguments being ignored in ParserJob

2012-06-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294730#comment-13294730 ] Lewis John McGibbney commented on NUTCH-1392: - Additionally this issue should

Re: VOTE Apache Nutch 2.0 RC1

2012-06-13 Thread Lewis John Mcgibbney
Hi Sebastian, On Wed, Jun 13, 2012 at 11:30 PM, Sebastian Nagel wrote: >I'll managed to perform a crawl with 2.0 and HBase: it rocks, indeed. > Much simpler than 1.x (no segments!). :0) > % ./bin/nutch readdb -stats > WebTable statistics start > WebTableReader: java.io.EOFException >        at

[jira] [Created] (NUTCH-1394) backport NUTCH-1232 Remove host field from index-basic

2012-06-13 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1394: --- Summary: backport NUTCH-1232 Remove host field from index-basic Key: NUTCH-1394 URL: https://issues.apache.org/jira/browse/NUTCH-1394 Project: Nutch

[jira] [Created] (NUTCH-1393) Display consistent usage of GeneratorJob with 1.X

2012-06-13 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1393: --- Summary: Display consistent usage of GeneratorJob with 1.X Key: NUTCH-1393 URL: https://issues.apache.org/jira/browse/NUTCH-1393 Project: Nutch

[jira] [Created] (NUTCH-1392) -force and -resume arguments being ignored in ParserJob

2012-06-13 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1392: --- Summary: -force and -resume arguments being ignored in ParserJob Key: NUTCH-1392 URL: https://issues.apache.org/jira/browse/NUTCH-1392 Project: Nutch

[jira] [Created] (NUTCH-1391) readdb -stats fires java.io.EOFException

2012-06-13 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1391: --- Summary: readdb -stats fires java.io.EOFException Key: NUTCH-1391 URL: https://issues.apache.org/jira/browse/NUTCH-1391 Project: Nutch Issue Ty

[jira] [Created] (NUTCH-1390) readdb -url $url throws NPE with gora-cassandra

2012-06-13 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1390: --- Summary: readdb -url $url throws NPE with gora-cassandra Key: NUTCH-1390 URL: https://issues.apache.org/jira/browse/NUTCH-1390 Project: Nutch I

[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2012-06-13 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "FrontPage" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=242&rev2=243 === Tutorials === * NutchTutorial - How to confi

Re: VOTE Apache Nutch 2.0 RC1

2012-06-13 Thread Sebastian Nagel
Hi Lewis, > Please see http://wiki.apache.org/nutch/Nutch2Tutorial which is an > update of Julien's (I think) page on GORA_HBase. Thsi will get you > rocking with HBase. The changes between Cassandra, Accumulo and the > other data stores are fairly trivial. I'll managed to perform a crawl with 2.

Re: Suitable Nutch 2.0 Project Description

2012-06-13 Thread Mattmann, Chris A (388J)
+1 to the description w/o experimental too (I agree with Ferdy). You guys ROCK. Cheers, Chris On Jun 13, 2012, at 5:29 AM, Lewis John Mcgibbney wrote: > Hi, > > Seeing as we have the ball rolling with the 2.0 RC. I thought I'd ask > about a suitable project descriptor. > > So far on trunk we

Re: VOTE Apache Nutch 2.0 RC1

2012-06-13 Thread Lewis John Mcgibbney
Hi Guys, Whilst updating the Nutch2Tutorial I got thinking that within Gora we don't supply binary distributions of the code, this is because when using Gora a user may wish/require to recompile the code to accomodate config changes etc. We only supply src distributions... Does this principle app

[Nutch Wiki] Trivial Update of "Nutch2Tutorial" by LewisJohnMcgibbney

2012-06-13 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "Nutch2Tutorial" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/Nutch2Tutorial?action=diff&rev1=3&rev2=4 This document describes how to get Nutch 2.0 to

[Nutch Wiki] Trivial Update of "Nutch2Tutorial" by LewisJohnMcgibbney

2012-06-13 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "Nutch2Tutorial" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/Nutch2Tutorial?action=diff&rev1=2&rev2=3 gora.datastore.default=org.apache.gora.hbase

[Nutch Wiki] Trivial Update of "Nutch2Tutorial" by LewisJohnMcgibbney

2012-06-13 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "Nutch2Tutorial" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/Nutch2Tutorial?action=diff&rev1=1&rev2=2 - + }}} + + * Ensure t

Re: Suitable Nutch 2.0 Project Description

2012-06-13 Thread Julien Nioche
" and and array other document " looks like a typo, rest is fine On 13 June 2012 13:45, Ferdy Galema wrote: > Hi, > > I would remove the 'experimental' notion. Aside from that it's fine with > me. > > Ferdy. > > > On Wed, Jun 13, 2012 at 2:29 PM, Lewis John Mcgibbney < > lewis.mcgibb...@gmail.co

Re: VOTE Apache Nutch 2.0 RC1

2012-06-13 Thread Julien Nioche
Ferdy > > The Nutch job jar is not present in the binary archive. This means > distributed running of jobs is not supported. I'm not sure if this is a > problem (since users can always build one themselves), merely pointing it > out. The recently released 1.5 also lacks this job jar, so at least n

[jira] [Commented] (NUTCH-1342) Read time out protocol-http

2012-06-13 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294429#comment-13294429 ] Ferdy Galema commented on NUTCH-1342: - Do you have any clue as to why protocol-httpcli

Re: Suitable Nutch 2.0 Project Description

2012-06-13 Thread Ferdy Galema
Hi, I would remove the 'experimental' notion. Aside from that it's fine with me. Ferdy. On Wed, Jun 13, 2012 at 2:29 PM, Lewis John Mcgibbney < lewis.mcgibb...@gmail.com> wrote: > Hi, > > Seeing as we have the ball rolling with the 2.0 RC. I thought I'd ask > about a suitable project descriptor

Suitable Nutch 2.0 Project Description

2012-06-13 Thread Lewis John Mcgibbney
Hi, Seeing as we have the ball rolling with the 2.0 RC. I thought I'd ask about a suitable project descriptor. So far on trunk we have ** Apache Nutch is an open source web-search software project. Stemming from Apache Lucene, it now builds on Apache Solr adding web-specifics, such as a crawler,

Re: VOTE Apache Nutch 2.0 RC1

2012-06-13 Thread Lewis John Mcgibbney
Hi Seb, Quick update On Tue, Jun 12, 2012 at 11:33 PM, Sebastian Nagel wrote: >1 some guidance would be nice. README.txt points > to http://wiki.apache.org/nutch/NutchTutorial which refers to 1.x Please see http://wiki.apache.org/nutch/Nutch2Tutorial which is an update of Julien's (I think) pag

[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2012-06-13 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "FrontPage" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=241&rev2=242 * Nutch2Roadmap -- Discussions on the architecture an

[Nutch Wiki] Trivial Update of "Nutch2Tutorial" by LewisJohnMcgibbney

2012-06-13 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "Nutch2Tutorial" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/Nutch2Tutorial New page: = Nutch 2.0 Tutorial = {{http://www.interadvertising.co.uk/files/n

[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2012-06-13 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "FrontPage" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=240&rev2=241 * [[NutchMavenSupport|Using Nutch as a Maven dependen

[Nutch Wiki] Trivial Update of "GORA_HBase" by LewisJohnMcgibbney

2012-06-13 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "GORA_HBase" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/GORA_HBase?action=diff&rev1=13&rev2=14 org.apache.gora.hbase.store.HBaseStore Default cla

[Nutch Wiki] Trivial Update of "GORA_HBase" by LewisJohnMcgibbney

2012-06-13 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "GORA_HBase" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/GORA_HBase?action=diff&rev1=12&rev2=13 This document describes how to get Nutch 2.0 to use

[Nutch Wiki] Trivial Update of "GORA_HBase" by LewisJohnMcgibbney

2012-06-13 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "GORA_HBase" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/GORA_HBase?action=diff&rev1=11&rev2=12 - This document describes how to get Nutch to use HBase

Re: VOTE Apache Nutch 2.0 RC1

2012-06-13 Thread Lewis John Mcgibbney
Hi Seb, As Chris said, the issues you highlight well justify another RC. I can shift it by the end of play today. Thanks very much for having a look through guys Lewis On Tue, Jun 12, 2012 at 11:33 PM, Sebastian Nagel wrote: > Hi Lewis, > > my first steps with 2.0 (to be continued, still stru

Re: VOTE Apache Nutch 2.0 RC1

2012-06-13 Thread Ferdy Galema
Hmm please ignore "the parse text limited to 100 chars", this is actually not the case. (Only in our branch that has a fix for limiting anchor texts; not yet present in in the nutchgora branch because it still needs polishing). So no need to wait for commits on my part. On Wed, Jun 13, 2012 at 11:

Re: VOTE Apache Nutch 2.0 RC1

2012-06-13 Thread Ferdy Galema
Findings about Nutch-2.0 RC 1. The Nutch job jar is not present in the binary archive. This means distributed running of jobs is not supported. I'm not sure if this is a problem (since users can always build one themselves), merely pointing it out. The recently released 1.5 also lacks this job jar