Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

2011-09-15 Thread Sami Siren
On Thu, Sep 15, 2011 at 9:55 PM, Markus Jelsma wrote: > There are many things i can write about this topic right now but don't feel > it's neccessary. The choice is difficult and perhaps painful but when the > voting round is opened by our project lead, i will vote for promoting 1.x back > to trun

Build failed in Jenkins: Nutch-trunk #1605

2011-09-15 Thread Apache Jenkins Server
See -- [...truncated 986 lines...] A src/plugin/subcollection/src/java/org/apache/nutch/collection/CollectionManager.java A src/plugin/subcollection/src/java/org/apache/nutch/collection/pack

Re: [Nutch Wiki] Trivial Update of "OldFAQs" by LewisJohnMcgibbney

2011-09-15 Thread Christopher Bader
How do I get off this list? I don't see an "unsubscribe" option. On Thu, Sep 15, 2011 at 3:00 PM, Apache Wiki wrote: > Dear Wiki user, > > You have subscribed to a wiki page or wiki category on "Nutch Wiki" for > change notification. > > The "OldFAQs" page has been changed by LewisJohnMcgibbney

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Edward Drapkin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105758#comment-13105758 ] Edward Drapkin commented on NUTCH-1113: --- The more I look into this, the more I'm cer

Build failed in Jenkins: Nutch-branch-1.4 #3

2011-09-15 Thread Apache Jenkins Server
See -- [...truncated 998 lines...] A src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection AU src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection/Sub

Re: Build failed in Jenkins: Nutch-branch-1.4 #2

2011-09-15 Thread lewis john mcgibbney
Hi all, I'm just getting used to the setting running on Jenkins and haven't quite gotten all of the parameters right yet as you can see. Just to clarify, Branch-1.4 is building fine, the Jenkins job just hasn't been set up properly. Lewis On Thu, Sep 15, 2011 at 11:09 PM, Apache Jenkins Server

Build failed in Jenkins: Nutch-branch-1.4 #2

2011-09-15 Thread Apache Jenkins Server
See -- [...truncated 950 lines...] A src/plugin/lib-regex-filter/src/java/org/apache/nutch/urlfilter/api AU src/plugin/lib-regex-filter/src/java/org/apache/nutch/urlfilter/api/RegexRule.java

Build failed in Jenkins: Nutch-branch-1.4 #1

2011-09-15 Thread Apache Jenkins Server
See -- [...truncated 949 lines...] A src/plugin/lib-regex-filter/src/java/org/apache/nutch/urlfilter A src/plugin/lib-regex-filter/src/java/org/apache/nutch/urlfilter/api AU src/plug

[jira] [Updated] (NUTCH-1112) off-by-one error in protocol-httpclient; truncates up to HttpBase.BUFFER_SIZE content

2011-09-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1112: - Patch Info: [Patch Available] Fix Version/s: 1.4 Thanks. Marked for 1.4. > off-by-one err

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Edward Drapkin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105714#comment-13105714 ] Edward Drapkin commented on NUTCH-1113: --- Upon further inspection, it appears that Se

[jira] [Issue Comment Edited] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Edward Drapkin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105714#comment-13105714 ] Edward Drapkin edited comment on NUTCH-1113 at 9/15/11 9:52 PM:

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105715#comment-13105715 ] Markus Jelsma commented on NUTCH-1113: -- Investigation, debug report; same stuff diffe

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Edward Drapkin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105704#comment-13105704 ] Edward Drapkin commented on NUTCH-1113: --- I don't have any idea what's causing this p

[jira] [Commented] (NUTCH-1112) off-by-one error in protocol-httpclient; truncates up to HttpBase.BUFFER_SIZE content

2011-09-15 Thread Edward Drapkin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105702#comment-13105702 ] Edward Drapkin commented on NUTCH-1112: --- All that needs to be changed is the < needs

[jira] [Updated] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1113: - Fix Version/s: 1.4 Thanks! It's marked for 1.4 now so it, at least, doesn't slip of the radar. Ca

[jira] [Updated] (NUTCH-1112) off-by-one error in protocol-httpclient; truncates up to HttpBase.BUFFER_SIZE content

2011-09-15 Thread Edward Drapkin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Drapkin updated NUTCH-1112: -- Attachment: httpresponse.patch Patch fixing off-by-1 error > off-by-one error in protocol-http

[jira] [Commented] (NUTCH-251) Administration GUI

2011-09-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105696#comment-13105696 ] Markus Jelsma commented on NUTCH-251: - Not likely. There's, however, an open issue to p

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Edward Drapkin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105693#comment-13105693 ] Edward Drapkin commented on NUTCH-1113: --- Using this command: nutch readseg -get mer

[jira] [Updated] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Edward Drapkin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Drapkin updated NUTCH-1113: -- Attachment: merged_segment_output.txt unmerged_segment_output.txt Output for se

[jira] [Commented] (NUTCH-251) Administration GUI

2011-09-15 Thread hadi (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105673#comment-13105673 ] hadi commented on NUTCH-251: Does tis plugin work with nutch 1.3? > Administration GUI > -

[Nutch Wiki] Trivial Update of "OldFAQs" by LewisJohnMcgibbney

2011-09-15 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "OldFAQs" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/OldFAQs?action=diff&rev1=1&rev2=2 <> + My system does not find the segments folder. W

[Nutch Wiki] Trivial Update of "FAQ" by LewisJohnMcgibbney

2011-09-15 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "FAQ" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/FAQ?action=diff&rev1=129&rev2=130 There's a user, developer, commits and agents lists, all available

[Nutch Wiki] Trivial Update of "FAQ" by LewisJohnMcgibbney

2011-09-15 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "FAQ" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/FAQ?action=diff&rev1=128&rev2=129 . Change this line: -^(file|ftp|mailto|https): to this: -^(http

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105661#comment-13105661 ] Markus Jelsma commented on NUTCH-1113: -- Can you rule out the indexer and see what you

[Nutch Wiki] Trivial Update of "ErrorMessages" by LewisJohnMcgibbney

2011-09-15 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "ErrorMessages" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/ErrorMessages?action=diff&rev1=10&rev2=11 <> - == General == + = General = + == J

[Nutch Wiki] Trivial Update of "ErrorMessages" by LewisJohnMcgibbney

2011-09-15 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "ErrorMessages" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/ErrorMessages?action=diff&rev1=9&rev2=10 Please report bugs to the mailing list! + <

[jira] [Created] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Edward Drapkin (JIRA)
Merging segments causes URLs to vanish from crawldb/index? -- Key: NUTCH-1113 URL: https://issues.apache.org/jira/browse/NUTCH-1113 Project: Nutch Issue Type: Bug Affects Versions:

[Nutch Wiki] Trivial Update of "FAQ" by LewisJohnMcgibbney

2011-09-15 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "FAQ" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/FAQ?action=diff&rev1=127&rev2=128 Please visit our [[http://lucene.apache.org/nutch/bot.html|"webmas

[Nutch Wiki] Trivial Update of "ErrorMessages" by LewisJohnMcgibbney

2011-09-15 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "ErrorMessages" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/ErrorMessages?action=diff&rev1=8&rev2=9 '''/etc/host.conf: line 1: cannot specify more the

Re: Setting up Jenkins CI for Nutch Branches

2011-09-15 Thread lewis john mcgibbney
Excellent Thank you On Thu, Sep 15, 2011 at 8:43 PM, Mattmann, Chris A (388J) < chris.a.mattm...@jpl.nasa.gov> wrote: > Hi Lewis, > > [mattmann@minotaur]/home/mattmann(24): modify_appgroups.pl hudson-jobadmin > --add=lewismc > LDAP Password (^D aborts): > Done! > Notification sent to . > [mattma

[Nutch Wiki] Trivial Update of "ErrorMessages" by LewisJohnMcgibbney

2011-09-15 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "ErrorMessages" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/ErrorMessages?action=diff&rev1=7&rev2=8 * Updating * Searching + Exception:

Re: Setting up Jenkins CI for Nutch Branches

2011-09-15 Thread Mattmann, Chris A (388J)
Hi Lewis, [mattmann@minotaur]/home/mattmann(24): modify_appgroups.pl hudson-jobadmin --add=lewismc LDAP Password (^D aborts): Done! Notification sent to . [mattmann@minotaur]/home/mattmann(25): Done! See: http://wiki.apache.org/general/Hudson And http://builds.apache.org/ for how to set

[Nutch Wiki] Trivial Update of "OldFAQs" by LewisJohnMcgibbney

2011-09-15 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "OldFAQs" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/OldFAQs New page: This is the official resource fod OLD Nutch FAQs. <>

[Nutch Wiki] Trivial Update of "Archive and Legacy" by LewisJohnMcgibbney

2011-09-15 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "Archive and Legacy" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/Archive%20and%20Legacy?action=diff&rev1=20&rev2=21 === General Information === * O

[Nutch Wiki] Trivial Update of "Archive and Legacy" by LewisJohnMcgibbney

2011-09-15 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "Archive and Legacy" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/Archive%20and%20Legacy?action=diff&rev1=19&rev2=20 === General Information === * O

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

2011-09-15 Thread Markus Jelsma
> Hi Guys, > > I thought I'd chime in on this thread. My comments below: > > I understand and share your frustration, however you need to bear in mind > > that things are done only if people volunteer and have time - usually > > taken from their holiday, weekends, evenings. Chris (who is the de f

Re: Setting up Jenkins CI for Nutch Branches

2011-09-15 Thread lewis john mcgibbney
Hi Chris, If you could set me up it would be great. I will be reporting to the dev's with any progress with the build so will progress to create the job in due course. Thank you On Thu, Sep 15, 2011 at 7:52 PM, Mattmann, Chris A (388J) < chris.a.mattm...@jpl.nasa.gov> wrote: > Hey Lewis, > >

Re: Setting up Jenkins CI for Nutch Branches

2011-09-15 Thread Mattmann, Chris A (388J)
Hey Lewis, I'm interested in it. I can help you do it. I'm a PMC chair so I can add people to the Jenkins group and I just set it up for OODT. Let me know whether you want me to: * add you to the jenkins admin group * let you create the 1.4 branch build job, or if you want me to do it Thanks!

Setting up Jenkins CI for Nutch Branches

2011-09-15 Thread lewis john mcgibbney
Hi Everyone, Been wanting to get this done for some time, but there doesn't seem to be much desire and response. I thought I would try to find out for sure before moving on. I am happy to put time aside to get it working. Any thoughts in general. Thanks -- *Lewis*

[jira] [Commented] (NUTCH-1005) Index headings plugin

2011-09-15 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105456#comment-13105456 ] Julien Nioche commented on NUTCH-1005: -- you are right. I'd read your comments too qui

[jira] [Updated] (NUTCH-1097) application/xhtml+xml should be enabled in plugin.xml of parse-html; allow multiple mimetypes for plugin.xml

2011-09-15 Thread Ferdy (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy updated NUTCH-1097: - Attachment: NUTCH-1097-trunk_v1.patch The patch for Nutch trunk. > application/xhtml+xml should be enabled in pl