[jira] [Updated] (NUTCH-1741) Support of Sitemaps in Nutch 2.x

2016-01-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1741: Assignee: cihad güzel > Support of Sitemaps in Nutch

[jira] [Commented] (NUTCH-1741) Support of Sitemaps in Nutch 2.x

2016-01-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113423#comment-15113423 ] Lewis John McGibbney commented on NUTCH-1741: - I'm nearly finished updating v6 patch for 2.X

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110702#comment-15110702 ] Lewis John McGibbney commented on NUTCH-1325: - What a patch. Real nice. I really like th

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110733#comment-15110733 ] Lewis John McGibbney commented on NUTCH-1325: - Nice Markus, the conversation in this ticket

[ANNOUNCE] Apache Nutch 2.3.1 Release

2016-01-21 Thread lewis john mcgibbney
Hi Folks, !!Apologies for cross posting!! The Apache Nutch PMC are pleased to announce the immediate release of Apache Nutch v2.3.1, we advise all current users and developers of the 2.X series to upgrade to this release. Nutch is a well matured, production ready Web crawler. Nutch 2.X branch

[RESULT] WAS Re: [VOTE] Release Apache Nutch 2.3.1rc2

2016-01-21 Thread Lewis John Mcgibbney
Hi Folks, I am bringing this VOTE to a close with the following results [3] +1 Release this package as Apache Nutch 2.3.1. Lewis John McGibbney* Sebastian Nagel* Chris Mattmann* [0] -1 Do not release this package because… *Nutch PMC Member I am really happy to therefore announce that the VOTE

[jira] [Commented] (NUTCH-2202) Integration of Anthelion (Focused Crawling Module) into Nutch

2016-01-21 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110867#comment-15110867 ] Lewis John McGibbney commented on NUTCH-2202: - I agree [~robertmeusel], this would be good

Re: [VOTE] Release Apache Nutch 2.3.1rc2

2016-01-20 Thread Lewis John Mcgibbney
Hi user@, dev@, PING on the Nutch 2.3.1 RC#2 Would really appreciate anyone who is able to review this release candidate. It would mean a lot for our 2.X user base. Thank you Lewis On Sun, Jan 10, 2016 at 7:01 AM, Lewis John Mcgibbney < lewis.mcgibb...@gmail.com> wrote: > Hi Folks, >

[jira] [Created] (NUTCH-2200) Establish process for publishing Docker containers

2016-01-16 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2200: --- Summary: Establish process for publishing Docker containers Key: NUTCH-2200 URL: https://issues.apache.org/jira/browse/NUTCH-2200 Project: Nutch

Re: [VOTE] Release Apache Nutch 2.3.1rc2

2016-01-13 Thread Lewis John Mcgibbney
Hi Seb, Thanks for taking the time to review the release candidate. Replies inline On Tue, Jan 12, 2016 at 10:17 AM, wrote: > +1 > > - good signatures > - tests pass > - I've successfully run a test crawl (bin/crawl) using HBase 0.98.8 > > Two minor points: > >

Re: [VOTE] Release Apache Nutch 2.3.1rc2

2016-01-13 Thread Lewis John Mcgibbney
Any others above to review please? On Sun, Jan 10, 2016 at 7:01 AM, Lewis John Mcgibbney < lewis.mcgibb...@gmail.com> wrote: > Hi Folks, > > A second candidate for the Nutch 2.3.1 release is available at: > > https://dist.apache.org/repos/dist/dev/nutch/2.3.1rc2/ >

[jira] [Updated] (NUTCH-1800) Documentation for Nutch 1.X REST API

2016-01-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1800: Fix Version/s: (was: 2.3.1) > Documentation for Nutch 1.X REST

[jira] [Updated] (NUTCH-1800) Documentation for Nutch 1.X REST API

2016-01-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1800: Summary: Documentation for Nutch 1.X REST API (was: Documentation for Nutch 1.X

[jira] [Created] (NUTCH-2199) Documentation for Nutch 2.X REST API

2016-01-10 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2199: --- Summary: Documentation for Nutch 2.X REST API Key: NUTCH-2199 URL: https://issues.apache.org/jira/browse/NUTCH-2199 Project: Nutch Issue Type

[VOTE] Release Apache Nutch 2.3.1rc2

2016-01-10 Thread Lewis John Mcgibbney
Hi Folks, A second candidate for the Nutch 2.3.1 release is available at: https://dist.apache.org/repos/dist/dev/nutch/2.3.1rc2/ The release candidate is a zip and tar.gz sources archive of the sources in: http://svn.apache.org/repos/asf/nutch/tags/release-2.3.1rc2/ In addition, a staged

[jira] [Commented] (NUTCH-1186) FreeGenerator always normalizes

2016-01-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091300#comment-15091300 ] Lewis John McGibbney commented on NUTCH-1186: - Hi [~markus17] I have scoped the patch

[jira] [Commented] (NUTCH-2168) Parse-tika fails to retrieve parser

2016-01-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090337#comment-15090337 ] Lewis John McGibbney commented on NUTCH-2168: - +1 for commit [~wastl-nagel] nice catch

[jira] [Comment Edited] (NUTCH-2168) Parse-tika fails to retrieve parser

2016-01-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090337#comment-15090337 ] Lewis John McGibbney edited comment on NUTCH-2168 at 1/9/16 2:03 AM

[jira] [Updated] (NUTCH-2094) Stopping and Restarting a crawl has issues in the Web UI

2016-01-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2094: Fix Version/s: (was: 2.4) 2.3.1 > Stopping and Restart

[jira] [Updated] (NUTCH-2166) Add reverse URL format to dump tool

2016-01-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2166: Fix Version/s: (was: 2.4) > Add reverse URL format to dump t

[jira] [Updated] (NUTCH-2165) FileDumper Util hard codes part-# folder name

2016-01-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2165: Fix Version/s: (was: 2.4) > FileDumper Util hard codes part-# folder n

[jira] [Commented] (NUTCH-2143) GeneratorJob ignores batch id passed as argument

2016-01-07 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15087804#comment-15087804 ] Lewis John McGibbney commented on NUTCH-2143: - Tested v3 and confirmed to fix the issue. I am

[jira] [Commented] (NUTCH-1186) FreeGenerator always normalizes

2016-01-05 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083138#comment-15083138 ] Lewis John McGibbney commented on NUTCH-1186: - Will scope and test [~markus17

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-29 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074453#comment-15074453 ] Lewis John McGibbney commented on NUTCH-2184: - [~markus17] coming back to this one briefly

[jira] [Commented] (NUTCH-1946) Upgrade to Gora 0.6.1

2015-12-29 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074319#comment-15074319 ] Lewis John McGibbney commented on NUTCH-1946: - Hi [~kalanya] bq. Hey guys, how do i apply

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060023#comment-15060023 ] Lewis John McGibbney commented on NUTCH-2184: - Excellent points Markus thanks for bringing

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060155#comment-15060155 ] Lewis John McGibbney commented on NUTCH-2184: - Ack On Wednesday, December 16, 2015, Markus

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058962#comment-15058962 ] Lewis John McGibbney commented on NUTCH-2184: - Issue is logged at NUTCH-2186 > Ena

[jira] [Created] (NUTCH-2186) -addBinaryContent flag can cause "String length must be a multiple of four" error in IndexingJob

2015-12-15 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2186: --- Summary: -addBinaryContent flag can cause "String length must be a multiple of four" error in IndexingJob Key: NUTCH-2186 URL: https://issues.apache.org/j

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058955#comment-15058955 ] Lewis John McGibbney commented on NUTCH-2184: - I am going to open another issue which

[jira] [Updated] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2184: Attachment: NUTCH-2184.patch Patch for trrunk. During testing this patch against

[jira] [Updated] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2184: Flags: Patch Patch Info: Patch Available > Enable IndexingJob to funct

[jira] [Work stopped] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2184 stopped by Lewis John McGibbney. --- > Enable IndexingJob to function with no craw

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059489#comment-15059489 ] Lewis John McGibbney commented on NUTCH-2184: - No, just the following https://github.com

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059459#comment-15059459 ] Lewis John McGibbney commented on NUTCH-2184: - I've tested this on scores of segments today

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-14 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056690#comment-15056690 ] Lewis John McGibbney commented on NUTCH-2184: - This issue also improves command line parsing

[jira] [Created] (NUTCH-2185) protocol-soda-consumer plugin

2015-12-13 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2185: --- Summary: protocol-soda-consumer plugin Key: NUTCH-2185 URL: https://issues.apache.org/jira/browse/NUTCH-2185 Project: Nutch Issue Type: Bug

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053975#comment-15053975 ] Lewis John McGibbney commented on NUTCH-2184: - Working on this right now folks. > Ena

[jira] [Work started] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2184 started by Lewis John McGibbney. --- > Enable IndexingJob to function with no craw

[jira] [Created] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-11 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2184: --- Summary: Enable IndexingJob to function with no crawldb Key: NUTCH-2184 URL: https://issues.apache.org/jira/browse/NUTCH-2184 Project: Nutch

[jira] [Commented] (NUTCH-2183) Improvement to SegmentChecker for skipping non-segments present in segments directory

2015-12-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049698#comment-15049698 ] Lewis John McGibbney commented on NUTCH-2183: - Would like to commit today if possible

[jira] [Resolved] (NUTCH-2180) FileDumper dumps data, but breaks midway on corrupt segments

2015-12-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2180. - Resolution: Fixed Committed @revision 1719004 in trunk > FileDumper dumps d

[jira] [Resolved] (NUTCH-2183) Improvement to SegmentChecker for skipping non-segments present in segments directory

2015-12-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2183. - Resolution: Fixed Committed @revision 1719006 in trunk. Thank you [~mjoyce

[jira] [Commented] (NUTCH-2180) FileDumper dumps data, but breaks midway on corrupt segments

2015-12-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048999#comment-15048999 ] Lewis John McGibbney commented on NUTCH-2180: - Harsha do you know what results in corrupted

[jira] [Updated] (NUTCH-2183) Improvement to SegmentChecker for skipping non-segments present in segments directory

2015-12-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2183: Description: The scenario is that you have a bunch of Nutch data which has been

[jira] [Created] (NUTCH-2183) Improvement to SegmentChecker for skipping non-segments present in segments directory

2015-12-08 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2183: --- Summary: Improvement to SegmentChecker for skipping non-segments present in segments directory Key: NUTCH-2183 URL: https://issues.apache.org/jira/browse/NUTCH-2183

[jira] [Updated] (NUTCH-2183) Improvement to SegmentChecker for skipping non-segments present in segments directory

2015-12-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2183: Attachment: NUTCH-2183.patch Patch for trunk. > Improvement to SegmentChec

Fwd: ApacheCon NA 2015 Travel Assistance Applications now open!

2015-12-07 Thread Lewis John Mcgibbney
ns now open! 1251 by: lewis john mcgibbney Administrivia: - To post to the list, e-mail: priv...@nutch.apache.org To unsubscribe, e-mail: private-digest-unsubscr...@nutch.apache.org For additional commands, e-mail: private-di

[RESULT] WAS Re: [VOTE] Release Apache Nutch 1.11 RC#2

2015-12-07 Thread Lewis John Mcgibbney
Hi user@ dev@, 72hrs has lapsed so I would like to bring this thread to a close! VOTE's wee cast with the following RESULT [7] +1 Release this package as Apache Nutch 1.11 Lewis John Mcgibbney* Roannel Fernández Hernández Sujen Shah* Chris A Mattmann* Julien Nioche* Sebastian Nagel* Jorge Luis

[RELEASE] Apache Nutch 1.11

2015-12-07 Thread lewis john mcgibbney
Hello Folks, 07 December 2015 - Nutch 1.11 Release The Apache Nutch PMC are pleased to announce the immediate release of Apache Nutch v1.11, we advise all current users and developers of the 1.X series to upgrade to this release. What is Apache Nutch? Nutch is a well matured, production ready

[jira] [Updated] (NUTCH-2181) Add Webpage for 3rd Party Connectors/Libraries to Apache Nutch

2015-12-07 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2181: Issue Type: Task (was: Bug) > Add Webpage for 3rd Party Connectors/Librar

[jira] [Created] (NUTCH-2181) Add Webpage for 3rd Party Connectors/Libraries to Apache Nutch

2015-12-07 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2181: --- Summary: Add Webpage for 3rd Party Connectors/Libraries to Apache Nutch Key: NUTCH-2181 URL: https://issues.apache.org/jira/browse/NUTCH-2181 Project

[VOTE] Release Apache Nutch 1.11 RC#2

2015-12-04 Thread Lewis John Mcgibbney
-1.11-rc2/ All artifacts have been signed with the following signature as present within KEYS 48BAEBF6 2013-10-28 Lewis John McGibbney (CODE SIGNING KEY) < lewi...@apache.org> In addition, a staged maven repository is available here: https://repository.apache.org/content/repositories/orgapach

Dropping Nutch 1.11RC#1 Artifacts

2015-12-03 Thread Lewis John Mcgibbney
Hi Chris, Can you please drop the Nutch 1.11RC#1 artifacts from repository.a.o and from https://dist.apache.org/repos/dist/dev/nutch/1.11/ Thanks very much Lewis -- *Lewis*

[jira] [Commented] (NUTCH-2172) Parsing whitespace not just tabs in contenttype-mapping.txt

2015-12-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037470#comment-15037470 ] Lewis John McGibbney commented on NUTCH-2172: - I think that is the point that Seb is making

[jira] [Commented] (NUTCH-2172) Parsing whitespace not just tabs in contenttype-mapping.txt

2015-12-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037471#comment-15037471 ] Lewis John McGibbney commented on NUTCH-2172: - [~wastl-nagel] this is a good patch. It is good

[jira] [Updated] (NUTCH-2178) DeduplicationJob to optionall group on host or domain

2015-12-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2178: Fix Version/s: (was: 1.11) 1.12 > Deduplication

[jira] [Updated] (NUTCH-2149) REST endpoint to read Nutch sequence files

2015-12-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2149: Fix Version/s: (was: 1.12) 1.11 > REST endpoint to r

[jira] [Updated] (NUTCH-2128) Refactor configuration end point

2015-12-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2128: Fix Version/s: (was: 1.12) 1.11 > Refactor configuration

[jira] [Commented] (NUTCH-2172) Parsing whitespace not just tabs in contenttype-mapping.txt

2015-12-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038801#comment-15038801 ] Lewis John McGibbney commented on NUTCH-2172: - +1 > Parsing whitespace not just t

[jira] [Commented] (NUTCH-2158) Upgrade to Tika 1.11

2015-11-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023141#comment-15023141 ] Lewis John McGibbney commented on NUTCH-2158: - I am +1 for this. If we can get this committed

[jira] [Comment Edited] (NUTCH-2158) Upgrade to Tika 1.11

2015-11-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023141#comment-15023141 ] Lewis John McGibbney edited comment on NUTCH-2158 at 11/23/15 9:43 PM

[jira] [Updated] (NUTCH-2162) Nutch Webapp Crawl fails as it tries to index

2015-11-20 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2162: Fix Version/s: (was: 1.11) 1.12 > Nutch Webapp Crawl fa

[jira] [Resolved] (NUTCH-2058) Indexer plugin that allows RegEx replacements on the NutchDocument field values

2015-11-20 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2058. - Resolution: Fixed Tests are not failing as per recent local builds https

[DISCUSS] Release Nutch 1.11?

2015-11-20 Thread Lewis John Mcgibbney
Hi Folks, Title says it all. There is only one pending issue for 1.11. https://issues.apache.org/jira/browse/NUTCH-2158 I am testing our the Tika 1.11 patch right now. Do you guys want me to push a release if we can get the Tika committed? I can do this tonight when I get home. Ta Lewis --

[jira] [Commented] (NUTCH-2158) Upgrade to Tika 1.11

2015-11-20 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018544#comment-15018544 ] Lewis John McGibbney commented on NUTCH-2158: - Hi [~jnioche], I reproduce your failing test

[jira] [Commented] (NUTCH-2069) Ignore external links based on domain

2015-11-19 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15015387#comment-15015387 ] Lewis John McGibbney commented on NUTCH-2069: - +1 for patch. Sorry about formatting folks. We

[jira] [Updated] (NUTCH-2069) Ignore external links based on domain

2015-11-19 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2069: Fix Version/s: 1.12 > Ignore external links based on dom

[jira] [Updated] (NUTCH-2069) Ignore external links based on domain

2015-11-19 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2069: Fix Version/s: (was: 1.12) 1.11 > Ignore external li

[jira] [Created] (NUTCH-2171) Upgrade Nutch Trunk to Java 1.8

2015-11-16 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2171: --- Summary: Upgrade Nutch Trunk to Java 1.8 Key: NUTCH-2171 URL: https://issues.apache.org/jira/browse/NUTCH-2171 Project: Nutch Issue Type: Task

Upgrade of mapred --> mapreduce in trunk e.g. Nutch 3.X

2015-11-14 Thread Lewis John Mcgibbney
Hi Folks, Mike Joyce and myself have been working on a Tinkerpop implementation of Node and NodeDB (generated through WebGraph) which builds a Vertex input, used by Tinkerpop, subsequently Gremlin and persisted into a graph database such as TitanDB. We have analyzed the problem quite a bit and

[jira] [Closed] (NUTCH-2170) When i am crawling the URL http://www.aossama.com/. it is crawling url like this com.aossama.www.http/

2015-11-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-2170. --- Resolution: Fixed Hi prabhakar please go to our mailing lists and we can help you

[jira] [Commented] (NUTCH-2157) Parent Issue for Addressing Miredot REST API Warnings

2015-11-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005130#comment-15005130 ] Lewis John McGibbney commented on NUTCH-2157: - +1 commit, this looks much better. The REST

[jira] [Commented] (NUTCH-2130) copyField rawcontent creates error within schema.xml

2015-11-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003551#comment-15003551 ] Lewis John McGibbney commented on NUTCH-2130: - +1 Seb please commit Sir > copyFi

[jira] [Updated] (NUTCH-2130) copyField rawcontent creates error within schema.xml

2015-11-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2130: Fix Version/s: (was: 2.4) 2.3.1 > copyField rawcont

[jira] [Updated] (NUTCH-2160) Upgrade Selenium Java to 2.48.2

2015-11-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2160: Issue Type: Improvement (was: Bug) > Upgrade Selenium Java to 2.4

[jira] [Closed] (NUTCH-2120) Remove MapWritable from trunk codebase

2015-11-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-2120. --- Committed revision 1714068 > Remove MapWritable from trunk codeb

[jira] [Resolved] (NUTCH-2160) Upgrade Selenium Java to 2.48.2

2015-11-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2160. - Resolution: Fixed Committed revision 1714071 > Upgrade Selenium Java to 2.4

[jira] [Resolved] (NUTCH-2120) Remove MapWritable from trunk codebase

2015-11-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2120. - Resolution: Fixed Fix Version/s: (was: 1.12) 1.11

[jira] [Commented] (NUTCH-2165) FileDumper Util hard codes part-# folder name

2015-11-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002598#comment-15002598 ] Lewis John McGibbney commented on NUTCH-2165: - +1 [~mjoyce] verified on small sample crawl

[jira] [Comment Edited] (NUTCH-2165) FileDumper Util hard codes part-# folder name

2015-11-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002598#comment-15002598 ] Lewis John McGibbney edited comment on NUTCH-2165 at 11/12/15 6:39 PM

[jira] [Commented] (NUTCH-2165) FileDumper Util hard codes part-# folder name

2015-11-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000658#comment-15000658 ] Lewis John McGibbney commented on NUTCH-2165: - It means that the remaining data is not dumped

[jira] [Commented] (NUTCH-2167) Backport TableUtil from 2.x for URL reversing

2015-11-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000912#comment-15000912 ] Lewis John McGibbney commented on NUTCH-2167: - Yes, an example of this being useful is within

[jira] [Updated] (NUTCH-2120) Remove MapWritable from trunk codebase

2015-11-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2120: Issue Type: Task (was: Bug) > Remove MapWritable from trunk codeb

[jira] [Updated] (NUTCH-2120) Remove MapWritable from trunk codebase

2015-11-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2120: Flags: Patch Patch Info: Patch Available > Remove MapWritable from tr

[jira] [Commented] (NUTCH-2160) Upgrade Selenium Java to 2.48.2

2015-11-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001105#comment-15001105 ] Lewis John McGibbney commented on NUTCH-2160: - Will commit by EoB today unless

[jira] [Updated] (NUTCH-2120) Remove MapWritable from trunk codebase

2015-11-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2120: Attachment: NUTCH-2120.patch Patch which removes this class from Trunk. > Rem

[jira] [Updated] (NUTCH-2163) Utilize current JVM threads to augment URLClassLoader with newly discovered classes

2015-11-06 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2163: Summary: Utilize current JVM threads to augment URLClassLoader with newly

[jira] [Commented] (NUTCH-2162) Nutch Webapp Crawl fails as it tries to index

2015-11-06 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993376#comment-14993376 ] Lewis John McGibbney commented on NUTCH-2162: - In all honesty a work around for this is merely

[jira] [Commented] (NUTCH-2162) Nutch Webapp Crawl fails as it tries to index

2015-11-06 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994168#comment-14994168 ] Lewis John McGibbney commented on NUTCH-2162: - Ack. I also got it working well with Solr

[jira] [Created] (NUTCH-2161) Interrupted failed and/or killed tasks fail to clean up temp directories in HDFS

2015-11-05 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2161: --- Summary: Interrupted failed and/or killed tasks fail to clean up temp directories in HDFS Key: NUTCH-2161 URL: https://issues.apache.org/jira/browse/NUTCH-2161

[jira] [Updated] (NUTCH-2162) Nutch Webapp Crawl fails as it tries to index

2015-11-05 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2162: Attachment: nutch_webapp.log Example log output from initiating a Crawl from

[jira] [Created] (NUTCH-2162) Nutch Webapp Crawl fails as it tries to index

2015-11-05 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2162: --- Summary: Nutch Webapp Crawl fails as it tries to index Key: NUTCH-2162 URL: https://issues.apache.org/jira/browse/NUTCH-2162 Project: Nutch

[jira] [Commented] (NUTCH-2160) Upgrade Selenium Java to 2.48.2

2015-11-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991063#comment-14991063 ] Lewis John McGibbney commented on NUTCH-2160: - I was under the impression that Selenium is now

[jira] [Updated] (NUTCH-2129) Track Protocol Status in Crawl Datum

2015-11-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2129: Fix Version/s: (was: 2.4) > Track Protocol Status in Crawl Da

[jira] [Commented] (NUTCH-2160) Upgrade Selenium Java to 2.48.2

2015-11-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991031#comment-14991031 ] Lewis John McGibbney commented on NUTCH-2160: - Thanks Kim. I have it working with Firefox

[jira] [Resolved] (NUTCH-2159) Ensure that all WebApp files are copied into generated artifacts for 1.X Webapp

2015-11-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2159. - Resolution: Fixed Committed @revision 1712705 in trunk > Ensure that all Web

[jira] [Resolved] (NUTCH-2086) Nutch 1.X Webui

2015-11-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2086. - Resolution: Fixed Fix Version/s: (was: 1.12) 1.11

[jira] [Created] (NUTCH-2159) Ensure that all WebApp files are copied into generated artifacts for 1.X Webapp

2015-11-03 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2159: --- Summary: Ensure that all WebApp files are copied into generated artifacts for 1.X Webapp Key: NUTCH-2159 URL: https://issues.apache.org/jira/browse/NUTCH-2159

[jira] [Updated] (NUTCH-2160) Upgrade Selenium Java to 2.48.2

2015-11-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2160: Attachment: NUTCH-2160.patch Patch for trunk. [~kwhitehall] hopefully

<    4   5   6   7   8   9   10   11   12   13   >