Re: Persistent problems with Ivy dependencies in Eclipse

2011-11-10 Thread Andrzej Bialecki
On 10/11/2011 04:39, Lewis John Mcgibbney wrote: Gets even more strange, both SWFParser and AutomationURLFilter import additonal depenedencies, however they are not included within thier plugin/ivy/ivy.xml files! Am I missing something here? Most likely these problems come from the initial por

[jira] [Closed] (NUTCH-1188) ERROR util.LogUtil - Cannot log with method [null]

2011-11-10 Thread Julien Nioche (Closed) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-1188. Resolution: Not A Problem Issue already fixed in SVN versions > ERROR util.LogUtil

[jira] [Closed] (NUTCH-990) protocol-httpclient fails with short pages

2011-11-10 Thread Julien Nioche (Closed) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-990. --- > protocol-httpclient fails with short pages > -- > >

[jira] [Closed] (NUTCH-1089) short compressed pages caused Exception

2011-11-10 Thread Julien Nioche (Closed) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-1089. NUTCH-1089, NUTCH-990 and NUTCH-1112 were all related to the same issue which has been fixed thanks to

[jira] [Commented] (NUTCH-1180) UpdateDB to backup previous CrawlDB

2011-11-10 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147641#comment-13147641 ] Markus Jelsma commented on NUTCH-1180: -- I'll send this in if there are no objections.

[jira] [Commented] (NUTCH-1178) Incorrect CSV header CrawlDatumCsvOutputFormat

2011-11-10 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147642#comment-13147642 ] Markus Jelsma commented on NUTCH-1178: -- Objections? > Incorrect CSV

[jira] [Commented] (NUTCH-1142) Normalization and filtering in WebGraph

2011-11-10 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147644#comment-13147644 ] Markus Jelsma commented on NUTCH-1142: -- I'll send this in today. > N

[jira] [Updated] (NUTCH-1171) WebGraph to overwrite normalized input keys

2011-11-10 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1171: - Fix Version/s: (was: 1.4) > WebGraph to overwrite normalized input keys > ---

[jira] [Updated] (NUTCH-1153) LinkRank must not log all hyperlinks

2011-11-10 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1153: - Attachment: NUTCH-1153-1.5-2.patch Final patch also disabled writing of _SUCCESS files by recent

[jira] [Commented] (NUTCH-1174) Outlinks are not properly normalized

2011-11-10 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147654#comment-13147654 ] Markus Jelsma commented on NUTCH-1174: -- Will commit of there are no objections.

[jira] [Commented] (NUTCH-1061) Migrate MoreIndexingFilter from Apache ORO to java.util.regex

2011-11-10 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147655#comment-13147655 ] Markus Jelsma commented on NUTCH-1061: -- Any comments on this one? >

[jira] [Commented] (NUTCH-1139) Indexer to delete documents

2011-11-10 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147656#comment-13147656 ] Markus Jelsma commented on NUTCH-1139: -- Comments please? > Indexer

[jira] [Resolved] (NUTCH-1153) LinkRank must not log all hyperlinks

2011-11-10 Thread Markus Jelsma (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1153. -- Resolution: Fixed Committed for 1.5 in rev. 1200344. > LinkRank must not log a

[jira] [Resolved] (NUTCH-1142) Normalization and filtering in WebGraph

2011-11-10 Thread Markus Jelsma (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1142. -- Resolution: Fixed Committed for 1.5 in rev. 1200346. > Normalization and filte

[jira] [Resolved] (NUTCH-1178) Incorrect CSV header CrawlDatumCsvOutputFormat

2011-11-10 Thread Markus Jelsma (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1178. -- Resolution: Fixed Committed for 1.5 in rev. 1200347. > Incorrect CSV header Cr

[jira] [Commented] (NUTCH-1180) UpdateDB to backup previous CrawlDB

2011-11-10 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147715#comment-13147715 ] Markus Jelsma commented on NUTCH-1180: -- Config directive: {code} db.preserve.back

[jira] [Commented] (NUTCH-1139) Indexer to delete documents

2011-11-10 Thread Andrzej Bialecki (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147722#comment-13147722 ] Andrzej Bialecki commented on NUTCH-1139: -- I suggest renaming the option to -del

[jira] [Commented] (NUTCH-1061) Migrate MoreIndexingFilter from Apache ORO to java.util.regex

2011-11-10 Thread Andrzej Bialecki (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147723#comment-13147723 ] Andrzej Bialecki commented on NUTCH-1061: -- +1. > Migrate MoreIn

[jira] [Commented] (NUTCH-1139) Indexer to delete documents

2011-11-10 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147726#comment-13147726 ] Markus Jelsma commented on NUTCH-1139: -- Yes, but does that also cover the indexer del

[jira] [Commented] (NUTCH-1153) LinkRank must not log all hyperlinks

2011-11-10 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147736#comment-13147736 ] Hudson commented on NUTCH-1153: --- Integrated in nutch-trunk-maven #16 (See [https://builds.a

[jira] [Commented] (NUTCH-1178) Incorrect CSV header CrawlDatumCsvOutputFormat

2011-11-10 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147737#comment-13147737 ] Hudson commented on NUTCH-1178: --- Integrated in nutch-trunk-maven #16 (See [https://builds.a

[jira] [Resolved] (NUTCH-1061) Migrate MoreIndexingFilter from Apache ORO to java.util.regex

2011-11-10 Thread Markus Jelsma (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1061. -- Resolution: Fixed Committed for 1.5 in rev. 1200360. > Migrate MoreIndexingFil

[jira] [Commented] (NUTCH-1142) Normalization and filtering in WebGraph

2011-11-10 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147738#comment-13147738 ] Hudson commented on NUTCH-1142: --- Integrated in nutch-trunk-maven #16 (See [https://builds.a

[jira] [Updated] (NUTCH-1155) Host/domain limit in generator is generate.max.count+1

2011-11-10 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1155: - Attachment: NUTCH-1155-1.5-1.patch simple patch > Host/domain limit in generator

[jira] [Resolved] (NUTCH-1155) Host/domain limit in generator is generate.max.count+1

2011-11-10 Thread Markus Jelsma (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1155. -- Resolution: Fixed Committed for 1.5 in rev. 1200370. > Host/domain limit in ge

[jira] [Updated] (NUTCH-1173) DomainStats doesn't count db_not_modified

2011-11-10 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1173: - Attachment: NUTCH-1173-1.5-1.patch Simple patch. > DomainStats doesn't count db_

[jira] [Resolved] (NUTCH-1173) DomainStats doesn't count db_not_modified

2011-11-10 Thread Markus Jelsma (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1173. -- Resolution: Fixed Committed for 1.5 in rev. 1200377. > DomainStats doesn't cou

Re: Signature == null ?

2011-11-10 Thread Markus Jelsma
After some DB updates, they're gone! Anyone recognizes this phenomenon? On Tuesday 08 November 2011 11:22:48 Markus Jelsma wrote: > On Tuesday 08 November 2011 11:15:37 Markus Jelsma wrote: > > Hi guys, > > > > I've a M/R job selecting only DB_FETCHED and DB_NOTMODIFIED records and > > their sign

[jira] [Commented] (NUTCH-1061) Migrate MoreIndexingFilter from Apache ORO to java.util.regex

2011-11-10 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147797#comment-13147797 ] Hudson commented on NUTCH-1061: --- Integrated in nutch-trunk-maven #17 (See [https://builds.a

[jira] [Commented] (NUTCH-1173) DomainStats doesn't count db_not_modified

2011-11-10 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147798#comment-13147798 ] Hudson commented on NUTCH-1173: --- Integrated in nutch-trunk-maven #17 (See [https://builds.a

[jira] [Commented] (NUTCH-1155) Host/domain limit in generator is generate.max.count+1

2011-11-10 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147796#comment-13147796 ] Hudson commented on NUTCH-1155: --- Integrated in nutch-trunk-maven #17 (See [https://builds.a

[Nutch Wiki] Trivial Update of "PublicServers" by LewisJohnMcgibbney

2011-11-10 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "PublicServers" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/PublicServers?action=diff&rev1=92&rev2=93 * [[http://www.coder-suche.de|Coder-Suche.de

[Nutch Wiki] Update of "PublicServers" by DallanQuass

2011-11-10 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "PublicServers" page has been changed by DallanQuass: http://wiki.apache.org/nutch/PublicServers?action=diff&rev1=93&rev2=94 Comment: WeRelate no longer crawling the Web * [[h

Re: Persistent problems with Ivy dependencies in Eclipse

2011-11-10 Thread Lewis John Mcgibbney
OK so the required dependencies can be seen below - FeedParser - URLAutomationFilter - - SWFParser - HTMLParser There is a real nasty hack which would replace the usual ${nutch.root} with is possible, however this is not how I want to progress. I'm also not sure where to find the dk.brics

[jira] [Created] (NUTCH-1200) Resolving Ivy dependencies in several plugins

2011-11-10 Thread Lewis John McGibbney (Created) (JIRA)
Resolving Ivy dependencies in several plugins -- Key: NUTCH-1200 URL: https://issues.apache.org/jira/browse/NUTCH-1200 Project: Nutch Issue Type: Improvement Components: build Affect

[jira] [Updated] (NUTCH-1200) Resolving Ivy dependencies in several plugins

2011-11-10 Thread Lewis John McGibbney (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1200: Attachment: NUTCH-1200-trunk.patch Real nasty hack which includes the ../../../ whi

[jira] [Updated] (NUTCH-1200) Resolving Ivy dependencies in several plugins

2011-11-10 Thread Lewis John McGibbney (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1200: Patch Info: Patch Available > Resolving Ivy dependencies in several plugins >

Re: Persistent problems with Ivy dependencies in Eclipse

2011-11-10 Thread Kirby Bohling
On Thu, Nov 10, 2011 at 6:14 PM, Lewis John Mcgibbney wrote: > OK so the required dependencies can be seen below > > - FeedParser conf="*->master"/> > - URLAutomationFilter - rev="???"/> > - SWFParser rev="2.0.1"/> > - HTMLParser   rev="1.9.15"/> > > There is a real nasty hack which would repl

Build failed in Jenkins: Nutch-nutchgora-ant #18

2011-11-10 Thread Apache Jenkins Server
See -- [...truncated 2453 lines...] deps-jar: clean-lib: resolve-default: [ivy:resolve] :: loading settings :: file = /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora-a

[jira] [Commented] (NUTCH-1153) LinkRank must not log all hyperlinks

2011-11-10 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148314#comment-13148314 ] Hudson commented on NUTCH-1153: --- Integrated in Nutch-trunk #1659 (See [https://builds.apach

[jira] [Commented] (NUTCH-1173) DomainStats doesn't count db_not_modified

2011-11-10 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148317#comment-13148317 ] Hudson commented on NUTCH-1173: --- Integrated in Nutch-trunk #1659 (See [https://builds.apach

[jira] [Commented] (NUTCH-1155) Host/domain limit in generator is generate.max.count+1

2011-11-10 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148313#comment-13148313 ] Hudson commented on NUTCH-1155: --- Integrated in Nutch-trunk #1659 (See [https://builds.apach

[jira] [Commented] (NUTCH-1061) Migrate MoreIndexingFilter from Apache ORO to java.util.regex

2011-11-10 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148316#comment-13148316 ] Hudson commented on NUTCH-1061: --- Integrated in Nutch-trunk #1659 (See [https://builds.apach

[jira] [Commented] (NUTCH-1178) Incorrect CSV header CrawlDatumCsvOutputFormat

2011-11-10 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148315#comment-13148315 ] Hudson commented on NUTCH-1178: --- Integrated in Nutch-trunk #1659 (See [https://builds.apach

[jira] [Commented] (NUTCH-1142) Normalization and filtering in WebGraph

2011-11-10 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148318#comment-13148318 ] Hudson commented on NUTCH-1142: --- Integrated in Nutch-trunk #1659 (See [https://builds.apach

Build failed in Jenkins: Nutch-trunk #1659

2011-11-10 Thread Apache Jenkins Server
See Changes: [markus] NUTCH-1173 DomainStats doesn't count db_not_modified [markus] NUTCH-1155 Host/domain limit in generator is generate.max.count+1 [markus] NUTCH-1061 Migrate MoreIndexingFilter from Apache ORO to java.util.regex [mar

[jira] [Commented] (NUTCH-1153) LinkRank must not log all hyperlinks

2011-11-10 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148323#comment-13148323 ] Hudson commented on NUTCH-1153: --- Integrated in Nutch-trunk-ant #76 (See [https://builds.apa

[jira] [Commented] (NUTCH-1155) Host/domain limit in generator is generate.max.count+1

2011-11-10 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148322#comment-13148322 ] Hudson commented on NUTCH-1155: --- Integrated in Nutch-trunk-ant #76 (See [https://builds.apa

[jira] [Commented] (NUTCH-1061) Migrate MoreIndexingFilter from Apache ORO to java.util.regex

2011-11-10 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148325#comment-13148325 ] Hudson commented on NUTCH-1061: --- Integrated in Nutch-trunk-ant #76 (See [https://builds.apa

[jira] [Commented] (NUTCH-1173) DomainStats doesn't count db_not_modified

2011-11-10 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148326#comment-13148326 ] Hudson commented on NUTCH-1173: --- Integrated in Nutch-trunk-ant #76 (See [https://builds.apa

[jira] [Commented] (NUTCH-1178) Incorrect CSV header CrawlDatumCsvOutputFormat

2011-11-10 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148324#comment-13148324 ] Hudson commented on NUTCH-1178: --- Integrated in Nutch-trunk-ant #76 (See [https://builds.apa

[jira] [Commented] (NUTCH-1142) Normalization and filtering in WebGraph

2011-11-10 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148327#comment-13148327 ] Hudson commented on NUTCH-1142: --- Integrated in Nutch-trunk-ant #76 (See [https://builds.apa