[jira] [Commented] (NUTCH-1251) Deletion of duplicates fails with org.apache.solr.client.solrj.SolrServerException

2012-04-03 Thread Arkadi Kosmynin (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245877#comment-13245877 ] Arkadi Kosmynin commented on NUTCH-1251: Thanks Markus! > Delet

[jira] [Commented] (NUTCH-1306) Commit after finished writing to solr index

2012-04-03 Thread Lewis John McGibbney (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245798#comment-13245798 ] Lewis John McGibbney commented on NUTCH-1306: - Hi Dan. In trunk, we have a num

Re: NutchGora release, and Nutch 1.x trunk release

2012-04-03 Thread Mattmann, Chris A (388J)
Thanks Lewis! Cheers, Chris P.S. Hopefully by this weekend... On Apr 3, 2012, at 7:23 AM, Lewis John Mcgibbney wrote: > Hi, > > On Tue, Apr 3, 2012 at 3:12 PM, Markus Jelsma > wrote: > > > Seems fine. Only updating KEYS is no longer necessary. > > Now sorted. > > Thanks whenever you can

Re: NutchGora release, and Nutch 1.x trunk release

2012-04-03 Thread Lewis John Mcgibbney
Hi, On Tue, Apr 3, 2012 at 3:12 PM, Markus Jelsma wrote: > > > Seems fine. Only updating KEYS is no longer necessary. > Now sorted. Thanks whenever you can get round to this Chris. Best Lewis

[Nutch Wiki] Trivial Update of "Release_HOWTO" by LewisJohnMcgibbney

2012-04-03 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "Release_HOWTO" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/Release_HOWTO?action=diff&rev1=12&rev2=13 = Preparation = 1. Create a new releas

[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2012-04-03 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "FrontPage" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=238&rev2=239 === Tutorials === * NutchTutorial - How to configur

Re: NutchGora release, and Nutch 1.x trunk release

2012-04-03 Thread Markus Jelsma
On Tuesday 03 April 2012 15:58:54 you wrote: > Hi Markus, > > On Apr 3, 2012, at 5:50 AM, Markus Jelsma wrote: > > Cool! > > > > Next time i'll ask infra to allow to supress notifications. > > > > Chris, will you RM one RC? And if possible list the detailed > > steps/command in the process in

Re: NutchGora release, and Nutch 1.x trunk release

2012-04-03 Thread Mattmann, Chris A (388J)
Hi Markus, On Apr 3, 2012, at 5:50 AM, Markus Jelsma wrote: > Cool! > > Next time i'll ask infra to allow to supress notifications. > > Chris, will you RM one RC? And if possible list the detailed steps/command in > the process in case you don't have to time RM 1.6 when the time comes. The >

[jira] [Commented] (NUTCH-1208) Don't include KEYS file in bin distribution

2012-04-03 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245295#comment-13245295 ] Hudson commented on NUTCH-1208: --- Integrated in nutch-trunk-maven #224 (See [https://builds.

Re: NutchGora release, and Nutch 1.x trunk release

2012-04-03 Thread Markus Jelsma
Cool! Next time i'll ask infra to allow to supress notifications. Chris, will you RM one RC? And if possible list the detailed steps/command in the process in case you don't have to time RM 1.6 when the time comes. The wiki is dated. I'm looking forward to yet another big release with lots of

Re: NutchGora release, and Nutch 1.x trunk release

2012-04-03 Thread Julien Nioche
> Remaining issue for 1.5: > NUTCH-1208 Don't include KEYS file in bin distribution > done! thanks > > I obviously couldn't supress e-mail notifications. My sincere apologies for > the deluge of e-mail! > no probs > > On Tuesday 03 April 2012 13:22:17 Julien Nioche wrote: > > Good idea. > > >

[jira] [Resolved] (NUTCH-1208) Don't include KEYS file in bin distribution

2012-04-03 Thread Julien Nioche (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1208. -- Resolution: Fixed trunk : committed revision 1308865. > Don't include KEYS fi

[jira] [Commented] (NUTCH-1270) some of Deflate encoded pages not fetched

2012-04-03 Thread behnam nikbakht (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245259#comment-13245259 ] behnam nikbakht commented on NUTCH-1270: for example, with the site: http://www.no

Re: NutchGora release, and Nutch 1.x trunk release

2012-04-03 Thread Markus Jelsma
Remaining issue for 1.5: NUTCH-1208 Don't include KEYS file in bin distribution I obviously couldn't supress e-mail notifications. My sincere apologies for the deluge of e-mail! On Tuesday 03 April 2012 13:22:17 Julien Nioche wrote: > Good idea. > > On 3 April 2012 11:29, Markus Jelsma wrote:

[jira] [Updated] (NUTCH-828) Fetch Filter

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-828: Fix Version/s: (was: 1.5) (was: nutchgora) 1.6 2012030

[jira] [Updated] (NUTCH-1088) Write Solr XML documents

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1088: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Write S

[jira] [Updated] (NUTCH-1103) Port protocol-sftp to 1.4

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1103: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Port pr

[jira] [Updated] (NUTCH-1215) UpdateDB should not require segment as input

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1215: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > UpdateD

[jira] [Updated] (NUTCH-1277) Fix [fallthrough] javac warnings

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1277: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Fix [fa

[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1024: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Dynamic

[jira] [Updated] (NUTCH-1317) Max content length by MIME-type

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1317: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Max con

[jira] [Updated] (NUTCH-1117) JUnit test for index-anchor

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1117: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > JUnit t

[jira] [Updated] (NUTCH-1181) Indexer to use webgraph inlinks

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1181: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Indexer

[jira] [Updated] (NUTCH-1149) DomainStats should process numeric CrawlDB metadata

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1149: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > DomainS

[jira] [Updated] (NUTCH-1118) JUnit test for index-basic

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1118: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > JUnit t

[jira] [Updated] (NUTCH-1284) Add site fetcher.max.crawl.delay as log output by default.

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1284: - Fix Version/s: (was: 1.5) (was: nutchgora) 1.6 2012

[jira] [Updated] (NUTCH-1053) Parsing of RSS feeds fails

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1053: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Parsing

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Expose Tik

[jira] [Updated] (NUTCH-827) HTTP POST Authentication

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-827: Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > HTTP POST

[jira] [Updated] (NUTCH-1275) Fix [unchecked] javac warnings

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1275: - Fix Version/s: (was: 1.5) (was: nutchgora) 1.6 2012

[jira] [Updated] (NUTCH-1151) Index-anchor to add numInlinks count

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1151: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Index-a

[jira] [Updated] (NUTCH-1021) Migrate OutlinkExtractor from Apache ORO to java.util.regex

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1021: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Migrate

[jira] [Updated] (NUTCH-1039) Fetcher fails for pages without content-length header

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1039: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Fetcher

[jira] [Updated] (NUTCH-1140) index-more plugin, resetTitle method creates multiple values in the Title field

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1140: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > index-m

[jira] [Updated] (NUTCH-1079) StringBuffer converted to StringBuilder

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1079: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > StringB

[jira] [Updated] (NUTCH-1319) HostNormalizer

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1319: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > HostNor

[jira] [Updated] (NUTCH-1202) Fetcher timebomb kills long waiting fetch jobs

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1202: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Fetcher

[jira] [Updated] (NUTCH-1218) Improve trunk API documentation

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1218: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Improve

[jira] [Updated] (NUTCH-1223) Migrate WebGraph to MapReduce API

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1223: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Migrate

[jira] [Updated] (NUTCH-1226) Migrate CrawlDbReader to MapReduce API

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1226: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Migrate

[jira] [Updated] (NUTCH-1035) Tune Solr config for Nutch users

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1035: - Fix Version/s: (was: 1.5) (was: nutchgora) 1.6 2012

[jira] [Updated] (NUTCH-1130) JUnit test for Any23 RDF plugin

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1130: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > JUnit t

[jira] [Updated] (NUTCH-1047) Pluggable indexing backends

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1047: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Pluggab

[jira] [Updated] (NUTCH-1300) Indexer to normalize URL's

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1300: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Indexer

[jira] [Updated] (NUTCH-1128) JUnit test for urlmeta

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1128: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > JUnit t

[jira] [Updated] (NUTCH-1034) Create Solr Velocity templates

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1034: - Fix Version/s: (was: 1.5) (was: nutchgora) 1.6 2012

[jira] [Updated] (NUTCH-1126) JUnit test for urlfilter-prefix

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1126: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > JUnit t

[jira] [Updated] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1087: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Depreca

[jira] [Updated] (NUTCH-1320) IndexChecker and ParseChecker choke on IDN's

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1320: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > IndexCh

[jira] [Updated] (NUTCH-585) [PARSE-HTML plugin] Block certain parts of HTML code from being indexed

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-585: Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > [PARSE-HTM

[jira] [Updated] (NUTCH-1125) JUnit test for tld

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1125: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > JUnit t

[jira] [Updated] (NUTCH-1107) Log slow parse entries

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1107: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Log slo

[jira] [Updated] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1031: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Delegat

[jira] [Updated] (NUTCH-1062) Migrate BasicURLNormalizer from Apache ORO to java.util.regex

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1062: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Migrate

[jira] [Updated] (NUTCH-208) http: proxy exception list:

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-208: Fix Version/s: (was: 1.5) (was: nutchgora) 1.6 2012030

[jira] [Updated] (NUTCH-1143) Omit anchor in webgraph's LinkDatum

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1143: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Omit an

[jira] [Updated] (NUTCH-1179) Option to restrict generated records by metadata

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1179: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Option

[jira] [Updated] (NUTCH-1247) CrawlDatum.retries should be int

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1247: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > CrawlDa

[jira] [Updated] (NUTCH-1127) JUnit test for urlfilter-validator

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1127: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > JUnit t

[jira] [Updated] (NUTCH-1122) JUnit test for protocol-ftp

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1122: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > JUnit t

[jira] [Updated] (NUTCH-1197) Add statically configured field values to solrindex-mapping.xml

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1197: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Add sta

[jira] [Updated] (NUTCH-1100) SolrDedup broken

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1100: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > SolrDed

[jira] [Updated] (NUTCH-1060) URL filters to produce regexes to be used by OutlinkExtractor.

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1060: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > URL fil

[jira] [Updated] (NUTCH-1001) bin/nutch fetch/parse handle crawl/segments directory

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1001: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > bin/nut

[jira] [Updated] (NUTCH-1124) JUnit test for scoring-opic

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1124: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > JUnit t

[jira] [Updated] (NUTCH-809) Parse-metatags plugin

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-809: Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Parse-meta

[jira] [Updated] (NUTCH-1046) Add tests for indexing to SOLR

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1046: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Add tes

[jira] [Updated] (NUTCH-1121) JUnit test for parse-js

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1121: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > JUnit t

[jira] [Updated] (NUTCH-1252) SegmentReader -get shows wrong data

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1252: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Segment

[jira] [Updated] (NUTCH-1308) Unnecessary truncate content configuration, and logging in parse-zip/ZipParser

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1308: - Fix Version/s: (was: 1.5) (was: nutchgora) 1.6 2012

[jira] [Updated] (NUTCH-1228) Change mapred.task.timeout to mapreduce.task.timeout in fetcher

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1228: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Change

[jira] [Updated] (NUTCH-1186) FreeGenerator always normalizes

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1186: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > FreeGen

[jira] [Updated] (NUTCH-1224) Migrate FreeGenerator to MapReduce API

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1224: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Migrate

[jira] [Updated] (NUTCH-865) Format source code in unique style

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-865: Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Format sou

[jira] [Updated] (NUTCH-1120) JUnit test for microformats-reltag

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1120: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > JUnit t

[jira] [Updated] (NUTCH-1123) JUnit test for scoring-link

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1123: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > JUnit t

[jira] [Updated] (NUTCH-1063) OutlinkExtractor test generates an exception but does not fail

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1063: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Outlink

[jira] [Updated] (NUTCH-1220) Upgrade Solr deps

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1220: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Upgrade

[jira] [Updated] (NUTCH-1014) Migrate from Apache ORO to java.util.regex

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1014: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Migrate

[jira] [Updated] (NUTCH-1233) Rely on Tika for outlink extraction

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1233: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Rely on

[jira] [Updated] (NUTCH-1119) JUnit test for index-static

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1119: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > JUnit t

[jira] [Updated] (NUTCH-1274) Fix [cast] javac warnings

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1274: - Fix Version/s: (was: 1.5) (was: nutchgora) 1.6 2012

[jira] [Updated] (NUTCH-1176) Fix all javadoc warnings from nightly builds

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1176: - Fix Version/s: (was: 1.5) (was: nutchgora) 1.6 2012

[jira] [Updated] (NUTCH-1040) Backport REST-API from 2.0

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1040: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Backpor

[jira] [Updated] (NUTCH-1183) Summary task for adding command line usage instructions to webgraph classes

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1183: - Fix Version/s: (was: 1.5) (was: nutchgora) 1.6 2012

[jira] [Updated] (NUTCH-1201) Allow for different FetcherThread impls

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1201: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Allow f

[jira] [Updated] (NUTCH-1262) Map `duplicating` content-types to a single type

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1262: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Map `du

[jira] [Updated] (NUTCH-1147) WebGraph nodeDumper uses only 1 reducer

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1147: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > WebGrap

[jira] [Updated] (NUTCH-1194) CrawlDB lock should be released earlier

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1194: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > CrawlDB

[jira] [Updated] (NUTCH-1150) http.redirect.max can lead to multiple parses of the same url

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1150: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > http.re

[jira] [Updated] (NUTCH-1084) ReadDB url throws exception

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1084: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > ReadDB

[jira] [Updated] (NUTCH-1273) Fix [deprecation] javac warnings

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1273: - Fix Version/s: (was: 1.5) (was: nutchgora) 1.6 2012

[jira] [Updated] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1113: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Merging

[jira] [Updated] (NUTCH-1116) Write JUnit tests for all plugins

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1116: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Write J

[jira] [Updated] (NUTCH-1249) Resolve all issues flagged up by adding javac -Xlint arguement

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1249: - Affects Version/s: (was: 1.5) Fix Version/s: (was: 1.5) 1.6

Re: GSoC : Web page scraper plugin

2012-04-03 Thread Aamir Khan
On Tue, Apr 3, 2012 at 4:45 PM, Lewis John Mcgibbney < lewis.mcgibb...@gmail.com> wrote: > Hi Aamir, > > > On Tue, Apr 3, 2012 at 12:05 PM, Aamir Khan wrote: > >> >> Exactly, I will have full summer to understand and get up to speed. But >> since my knowledge is very limited my proposal won't be

[jira] [Updated] (NUTCH-578) URL fetched with 403 is generated over and over again

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-578: Fix Version/s: (was: 1.5) 1.6 > URL fetched with 403 is generated over an

[jira] [Updated] (NUTCH-1129) Any23 Nutch plugin

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1129: - Fix Version/s: (was: 1.5) 1.6 > Any23 Nutch plugin > -

[jira] [Updated] (NUTCH-1251) Deletion of duplicates fails with org.apache.solr.client.solrj.SolrServerException

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1251: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Deletio

[jira] [Updated] (NUTCH-1219) Upgrade all jobs to new MapReduce API

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1219: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Upgrade

  1   2   >