Re: Suitable naming for > Nutchgora branch?

2012-04-25 Thread Mattmann, Chris A (388J)
Great work Lewis, thanks! Cheers, Chris On Apr 25, 2012, at 4:01 PM, Lewis John Mcgibbney wrote: > Hi Everyone, > > As you guys will have seen I've quickly polluted our dev list again > (sorry!!!) with set and classify for 2.1. > > The open issues for 2.0 are ones which I think we could addre

Re: Suitable naming for > Nutchgora branch?

2012-04-25 Thread Lewis John Mcgibbney
Hi Everyone, As you guys will have seen I've quickly polluted our dev list again (sorry!!!) with set and classify for 2.1. The open issues for 2.0 are ones which I think we could address within the 2.0 release. This is merely my opinion, based upon the assertion that they all contain patches whic

[jira] [Updated] (NUTCH-849) different versions of the same library in nutch-2.0-dev.job and local\lib directory

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-849: --- Affects Version/s: nutchgora 1.4 Fix Version/s: (was

[jira] [Updated] (NUTCH-979) Add support for deleting Solr documents with ProtocolStatusCodes.NOTFOUND

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-979: --- Patch Info: Patch Available > Add support for deleting Solr documents with Protoco

[jira] [Updated] (NUTCH-979) Add support for deleting Solr documents with ProtocolStatusCodes.NOTFOUND

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-979: --- Fix Version/s: (was: nutchgora) 2.1 Some work to be done Set a

[jira] [Updated] (NUTCH-797) parse-tika is not properly constructing URLs when the target begins with a "?"

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-797: --- Affects Version/s: nutchgora Fix Version/s: (was: nutchgora)

[jira] [Updated] (NUTCH-710) Support for rel="canonical" attribute

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-710: --- Fix Version/s: (was: nutchgora) 2.1 1.6 Set

[jira] [Updated] (NUTCH-1290) crawlId not supported by all Tools

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1290: Patch Info: Patch Available > crawlId not supported by all Tools >

[jira] [Updated] (NUTCH-944) Increase the number of elements to look for URLs and add the ability to specify multiple attributes by elements

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-944: --- Fix Version/s: (was: nutchgora) 2.1 1.6 Set

[jira] [Updated] (NUTCH-1025) Add option not to commit to Solr

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1025: Fix Version/s: (was: nutchgora) 2.1 Set and Classify

[jira] [Updated] (NUTCH-1285) Debian Packaging for Nutch

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1285: Fix Version/s: (was: nutchgora) 2.1 Set and Classify

[jira] [Updated] (NUTCH-978) A Plugin for extracting certain element of a web page on html page parsing.

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-978: --- Fix Version/s: (was: nutchgora) 2.1 Set and Classify

[jira] [Updated] (NUTCH-1249) Resolve all issues flagged up by adding javac -Xlint arguement

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1249: Fix Version/s: (was: nutchgora) 2.1 Set and Classify

[jira] [Updated] (NUTCH-841) Nutch 2.0 webapp

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-841: --- Affects Version/s: nutchgora Fix Version/s: (was: nutchgora)

[jira] [Updated] (NUTCH-875) Port Webgraph to Nutch 2.0

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-875: --- Fix Version/s: (was: nutchgora) 2.1 > Port Webgraph to Nutc

[jira] [Updated] (NUTCH-864) Fetcher generates entries with status 0

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-864: --- Affects Version/s: nutchgora Fix Version/s: (was: nutchgora)

[jira] [Updated] (NUTCH-970) Injector job crashes with MySQL with table collation set to utf8_general_ci

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-970: --- Fix Version/s: (was: nutchgora) 2.1 Set and Classify

[jira] [Updated] (NUTCH-1094) create comprehensive documentation for Nutchgora branch

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1094: Fix Version/s: (was: nutchgora) 2.1 > create comprehensi

[jira] [Updated] (NUTCH-1026) Strip UTF-8 non-character codepoints

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1026: Fix Version/s: (was: nutchgora) 2.1 Set and Classify

[jira] [Commented] (NUTCH-879) URL-s getting lost

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262171#comment-13262171 ] Lewis John McGibbney commented on NUTCH-879: This looks heliishly serious and p

[jira] [Updated] (NUTCH-992) SolrDedup is broken in trunk

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-992: --- Fix Version/s: (was: nutchgora) 2.1 Set and Classify

[jira] [Updated] (NUTCH-956) solrindex issues

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-956: --- Fix Version/s: (was: nutchgora) 2.1 Set and Classify more wor

[jira] [Commented] (NUTCH-902) Add all necessary files and configuration so that nutch can be used with different backends out-of-the-box

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262160#comment-13262160 ] Lewis John McGibbney commented on NUTCH-902: I made some commits on this to in

[jira] [Updated] (NUTCH-840) Port tests from parse-html to parse-tika

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-840: --- Fix Version/s: (was: nutchgora) 2.1 Set and Classify

[jira] [Updated] (NUTCH-842) AutoGenerate WebPage code

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-842: --- Affects Version/s: nutchgora Fix Version/s: (was: nutchgora)

[jira] [Updated] (NUTCH-887) Delegate parsing of feeds to Tika

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-887: --- Fix Version/s: (was: nutchgora) 2.1 Set and Classify

[jira] [Updated] (NUTCH-1038) Port IndexingFiltersChecker to 2.0

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1038: Affects Version/s: nutchgora Fix Version/s: (was: nutchgora)

[jira] [Updated] (NUTCH-1283) Radically update all Solr configuration in Nutchgora

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1283: Fix Version/s: (was: nutchgora) 2.1 Set and Classify

[jira] [Commented] (NUTCH-1340) Increase scalability by only removing markers when they actually exist for DbUpdaterReducer

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262121#comment-13262121 ] Lewis John McGibbney commented on NUTCH-1340: - Hi Ferdy. I am +1 for this goin

[jira] [Updated] (NUTCH-1104) Port issues from trunk NutchGora branch

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1104: Fix Version/s: (was: nutchgora) 2.1 Set and Classify

[jira] [Updated] (NUTCH-1277) Fix [fallthrough] javac warnings

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1277: Fix Version/s: (was: nutchgora) 2.1 Set and Classify

[jira] [Updated] (NUTCH-1164) Write JUnit tests for protocol-http

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1164: Fix Version/s: (was: nutchgora) 2.1 Set and Classify

[jira] [Updated] (NUTCH-1168) Write JUnit tests for tld

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1168: Fix Version/s: (was: nutchgora) 2.1 Set and Classify

[jira] [Updated] (NUTCH-1166) Write JUnit tests for scoring-link

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1166: Fix Version/s: (was: nutchgora) 2.1 Set and Classify

[jira] [Updated] (NUTCH-1169) Write JUnit tests for urlfilter-prefix

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1169: Fix Version/s: (was: nutchgora) 2.1 Set and Classify

[jira] [Updated] (NUTCH-1161) Write JUnit tests for microformats-reltag plugin

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1161: Fix Version/s: (was: nutchgora) 2.1 Set and Classify

[jira] [Updated] (NUTCH-1160) Write JUnit tests for index-basic

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1160: Fix Version/s: (was: nutchgora) 2.1 Set and Classify

[jira] [Updated] (NUTCH-1165) Write JUnit tests for protocol-sftp

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1165: Fix Version/s: (was: nutchgora) 2.1 Set and Classify

[jira] [Updated] (NUTCH-1163) Write JUnit tests for protocol-ftp

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1163: Fix Version/s: (was: nutchgora) 2.1 Set and Classify

[jira] [Updated] (NUTCH-1158) Write JUnit tests for all nutchgora plugins

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1158: Fix Version/s: (was: nutchgora) 2.1 Set and Classify

[jira] [Updated] (NUTCH-1170) Write JUnit tests for urlfilter-validator

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1170: Fix Version/s: (was: nutchgora) 2.1 Set and Classify

Re: Suitable naming for > Nutchgora branch?

2012-04-25 Thread Mattmann, Chris A (388J)
Hi Guys, Yep I think we've beat the dead horse here about the name :) This is a good recent discussion/summary: http://s.apache.org/CoY and I think it had some productive outcomes. I envision a world in which we keep releasing the current 1.x series until we get up to 1.9, and then hopefully in p

Re: Suitable naming for > Nutchgora branch?

2012-04-25 Thread Julien Nioche
> I must say that since the move of Nutchgora from trunk to branch it's kind > of odd that it's still referred to as 2.x. (For now that's okay I guess). > Moving it from the trunk made a lot of sense and has been abundantly discussed on this list. We had one stable version which is actively mainta

Re: Suitable naming for > Nutchgora branch?

2012-04-25 Thread Ferdy Galema
Hi Lewis, 2.1 is fine with me. This is assuming 2.x is a good naming scheme in the first place. I must say that since the move of Nutchgora from trunk to branch it's kind of odd that it's still referred to as 2.x. (For now that's okay I guess). Ferdy On Wed, Apr 25, 2012 at 10:46 AM, Lewis John

[jira] [Resolved] (NUTCH-946) cache.jsp does not recognize encoding conversion from content different to UTF-8

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-946. Resolution: Won't Fix This issue is now deprecated and can't be fixed in current dev

Suitable naming for > Nutchgora branch?

2012-04-25 Thread Lewis John Mcgibbney
Good Morning, Does anyone have a differing opinion on naming next development track for Nutchgora branch 2.1? Before I set and classify most issues it would be good to know. Thank you Lewis -- *Lewis*

[jira] [Updated] (NUTCH-896) Gora-based tests need to have their own config files

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-896: --- Fix Version/s: (was: nutchgora) 2.1 Set and classify

[jira] [Updated] (NUTCH-1162) Write JUnit tests for parse-js

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1162: Fix Version/s: (was: nutchgora) 2.1 Set and classify

[jira] [Updated] (NUTCH-1167) Write JUnit tests for scoring-opic

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1167: Fix Version/s: (was: nutchgora) 2.1 Set and classify

[jira] [Updated] (NUTCH-1159) Write JUnit tests for index-anchor

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1159: Fix Version/s: (was: nutchgora) 2.1 Set and classify

[jira] [Updated] (NUTCH-874) Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-874: --- Affects Version/s: nutchgora Fix Version/s: (was: nutchgora)

[jira] [Updated] (NUTCH-1081) ant tests fail

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1081: Fix Version/s: (was: nutchgora) 2.1 Set and classify

[jira] [Updated] (NUTCH-882) Design a Host table in GORA

2012-04-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-882: --- Patch Info: Patch Available > Design a Host table in GORA > --