[jira] [Created] (NUTCH-2208) Fix 4 skipped tests in TestGenerator

2016-01-26 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2208: --- Summary: Fix 4 skipped tests in TestGenerator Key: NUTCH-2208 URL: https://issues.apache.org/jira/browse/NUTCH-2208 Project: Nutch Issue Type:

[jira] [Updated] (NUTCH-2208) Fix 4 skipped tests in TestGenerator

2016-01-26 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2208: Attachment: TEST-org.apache.nutch.crawl.TestGenerator.txt Attached is full test log

[jira] [Resolved] (NUTCH-1741) Support of Sitemaps in Nutch 2.x

2016-01-26 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1741. - Resolution: Fixed Committed revision 1726853 in 2.X Thank you to everyone that

[jira] [Commented] (NUTCH-2206) Provide example scoring.similarity.stopword.file

2016-01-26 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117800#comment-15117800 ] Lewis John McGibbney commented on NUTCH-2206: - We should most likely also provide the

[jira] [Commented] (NUTCH-2206) Provide example scoring.similarity.stopword.file

2016-01-26 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118286#comment-15118286 ] Lewis John McGibbney commented on NUTCH-2206: - +1 [~sujenshah], thanks > Provide example

[jira] [Updated] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2016-01-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2184: Attachment: NUTCH-2184v2.patch Updated patch for trunk. [~markus17], working to

[jira] [Created] (NUTCH-2206) Provide example scoring.similarity.stopword.file

2016-01-25 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2206: --- Summary: Provide example scoring.similarity.stopword.file Key: NUTCH-2206 URL: https://issues.apache.org/jira/browse/NUTCH-2206 Project: Nutch

[jira] [Commented] (NUTCH-2206) Provide example scoring.similarity.stopword.file

2016-01-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116491#comment-15116491 ] Lewis John McGibbney commented on NUTCH-2206: - CC [~sujenshah] > Provide example

[jira] [Created] (NUTCH-2207) Remove class duplication and smarten-up scoring-similarity plugin

2016-01-25 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2207: --- Summary: Remove class duplication and smarten-up scoring-similarity plugin Key: NUTCH-2207 URL: https://issues.apache.org/jira/browse/NUTCH-2207

[jira] [Commented] (NUTCH-2171) Upgrade Nutch Trunk to Java 1.8

2016-01-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113380#comment-15113380 ] Lewis John McGibbney commented on NUTCH-2171: - Hey [~jorgelbg] feel free to assign this to

[jira] [Updated] (NUTCH-1741) Support of Sitemaps in Nutch 2.x

2016-01-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1741: Assignee: cihad güzel > Support of Sitemaps in Nutch 2.x >

[jira] [Commented] (NUTCH-1741) Support of Sitemaps in Nutch 2.x

2016-01-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113423#comment-15113423 ] Lewis John McGibbney commented on NUTCH-1741: - I'm nearly finished updating v6 patch for 2.X

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110702#comment-15110702 ] Lewis John McGibbney commented on NUTCH-1325: - What a patch. Real nice. I really like th

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2016-01-21 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110733#comment-15110733 ] Lewis John McGibbney commented on NUTCH-1325: - Nice Markus, the conversation in this ticket is

[jira] [Commented] (NUTCH-2202) Integration of Anthelion (Focused Crawling Module) into Nutch

2016-01-21 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110867#comment-15110867 ] Lewis John McGibbney commented on NUTCH-2202: - I agree [~robertmeusel], this would be good to

[jira] [Created] (NUTCH-2200) Establish process for publishing Docker containers

2016-01-16 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2200: --- Summary: Establish process for publishing Docker containers Key: NUTCH-2200 URL: https://issues.apache.org/jira/browse/NUTCH-2200 Project: Nutch

[jira] [Updated] (NUTCH-1800) Documentation for Nutch 1.X REST API

2016-01-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1800: Fix Version/s: (was: 2.3.1) > Documentation for Nutch 1.X REST API >

[jira] [Updated] (NUTCH-1800) Documentation for Nutch 1.X REST API

2016-01-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1800: Summary: Documentation for Nutch 1.X REST API (was: Documentation for Nutch 1.X

[jira] [Created] (NUTCH-2199) Documentation for Nutch 2.X REST API

2016-01-10 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2199: --- Summary: Documentation for Nutch 2.X REST API Key: NUTCH-2199 URL: https://issues.apache.org/jira/browse/NUTCH-2199 Project: Nutch Issue Type:

[jira] [Commented] (NUTCH-1186) FreeGenerator always normalizes

2016-01-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091300#comment-15091300 ] Lewis John McGibbney commented on NUTCH-1186: - Hi [~markus17] I have scoped the patch and

[jira] [Commented] (NUTCH-2168) Parse-tika fails to retrieve parser

2016-01-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090337#comment-15090337 ] Lewis John McGibbney commented on NUTCH-2168: - +1 for commit [~wastl-nagel] nice catch and

[jira] [Comment Edited] (NUTCH-2168) Parse-tika fails to retrieve parser

2016-01-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090337#comment-15090337 ] Lewis John McGibbney edited comment on NUTCH-2168 at 1/9/16 2:03 AM: -

[jira] [Updated] (NUTCH-2094) Stopping and Restarting a crawl has issues in the Web UI

2016-01-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2094: Fix Version/s: (was: 2.4) 2.3.1 > Stopping and Restarting a

[jira] [Updated] (NUTCH-2166) Add reverse URL format to dump tool

2016-01-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2166: Fix Version/s: (was: 2.4) > Add reverse URL format to dump tool >

[jira] [Updated] (NUTCH-2165) FileDumper Util hard codes part-# folder name

2016-01-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2165: Fix Version/s: (was: 2.4) > FileDumper Util hard codes part-# folder name >

[jira] [Commented] (NUTCH-2143) GeneratorJob ignores batch id passed as argument

2016-01-07 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15087804#comment-15087804 ] Lewis John McGibbney commented on NUTCH-2143: - Tested v3 and confirmed to fix the issue. I am

[jira] [Commented] (NUTCH-1186) FreeGenerator always normalizes

2016-01-05 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083138#comment-15083138 ] Lewis John McGibbney commented on NUTCH-1186: - Will scope and test [~markus17] >

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-29 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074453#comment-15074453 ] Lewis John McGibbney commented on NUTCH-2184: - [~markus17] coming back to this one briefly,

[jira] [Commented] (NUTCH-1946) Upgrade to Gora 0.6.1

2015-12-29 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074319#comment-15074319 ] Lewis John McGibbney commented on NUTCH-1946: - Hi [~kalanya] bq. Hey guys, how do i apply

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060023#comment-15060023 ] Lewis John McGibbney commented on NUTCH-2184: - Excellent points Markus thanks for bringing

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060155#comment-15060155 ] Lewis John McGibbney commented on NUTCH-2184: - Ack On Wednesday, December 16, 2015, Markus

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058962#comment-15058962 ] Lewis John McGibbney commented on NUTCH-2184: - Issue is logged at NUTCH-2186 > Enable

[jira] [Created] (NUTCH-2186) -addBinaryContent flag can cause "String length must be a multiple of four" error in IndexingJob

2015-12-15 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2186: --- Summary: -addBinaryContent flag can cause "String length must be a multiple of four" error in IndexingJob Key: NUTCH-2186 URL:

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058955#comment-15058955 ] Lewis John McGibbney commented on NUTCH-2184: - I am going to open another issue which

[jira] [Updated] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2184: Attachment: NUTCH-2184.patch Patch for trrunk. During testing this patch against

[jira] [Updated] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2184: Flags: Patch Patch Info: Patch Available > Enable IndexingJob to function

[jira] [Work stopped] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2184 stopped by Lewis John McGibbney. --- > Enable IndexingJob to function with no crawldb >

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059489#comment-15059489 ] Lewis John McGibbney commented on NUTCH-2184: - No, just the following

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059459#comment-15059459 ] Lewis John McGibbney commented on NUTCH-2184: - I've tested this on scores of segments today

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-14 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056690#comment-15056690 ] Lewis John McGibbney commented on NUTCH-2184: - This issue also improves command line parsing

[jira] [Created] (NUTCH-2185) protocol-soda-consumer plugin

2015-12-13 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2185: --- Summary: protocol-soda-consumer plugin Key: NUTCH-2185 URL: https://issues.apache.org/jira/browse/NUTCH-2185 Project: Nutch Issue Type: Bug

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053975#comment-15053975 ] Lewis John McGibbney commented on NUTCH-2184: - Working on this right now folks. > Enable

[jira] [Work started] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2184 started by Lewis John McGibbney. --- > Enable IndexingJob to function with no crawldb >

[jira] [Created] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-11 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2184: --- Summary: Enable IndexingJob to function with no crawldb Key: NUTCH-2184 URL: https://issues.apache.org/jira/browse/NUTCH-2184 Project: Nutch

[jira] [Commented] (NUTCH-2183) Improvement to SegmentChecker for skipping non-segments present in segments directory

2015-12-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049698#comment-15049698 ] Lewis John McGibbney commented on NUTCH-2183: - Would like to commit today if possible as this

[jira] [Resolved] (NUTCH-2180) FileDumper dumps data, but breaks midway on corrupt segments

2015-12-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2180. - Resolution: Fixed Committed @revision 1719004 in trunk > FileDumper dumps data,

[jira] [Resolved] (NUTCH-2183) Improvement to SegmentChecker for skipping non-segments present in segments directory

2015-12-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2183. - Resolution: Fixed Committed @revision 1719006 in trunk. Thank you [~mjoyce] for

[jira] [Commented] (NUTCH-2180) FileDumper dumps data, but breaks midway on corrupt segments

2015-12-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048999#comment-15048999 ] Lewis John McGibbney commented on NUTCH-2180: - Harsha do you know what results in corrupted

[jira] [Updated] (NUTCH-2183) Improvement to SegmentChecker for skipping non-segments present in segments directory

2015-12-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2183: Description: The scenario is that you have a bunch of Nutch data which has been

[jira] [Created] (NUTCH-2183) Improvement to SegmentChecker for skipping non-segments present in segments directory

2015-12-08 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2183: --- Summary: Improvement to SegmentChecker for skipping non-segments present in segments directory Key: NUTCH-2183 URL: https://issues.apache.org/jira/browse/NUTCH-2183

[jira] [Updated] (NUTCH-2183) Improvement to SegmentChecker for skipping non-segments present in segments directory

2015-12-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2183: Attachment: NUTCH-2183.patch Patch for trunk. > Improvement to SegmentChecker for

[jira] [Updated] (NUTCH-2181) Add Webpage for 3rd Party Connectors/Libraries to Apache Nutch

2015-12-07 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2181: Issue Type: Task (was: Bug) > Add Webpage for 3rd Party Connectors/Libraries to

[jira] [Created] (NUTCH-2181) Add Webpage for 3rd Party Connectors/Libraries to Apache Nutch

2015-12-07 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2181: --- Summary: Add Webpage for 3rd Party Connectors/Libraries to Apache Nutch Key: NUTCH-2181 URL: https://issues.apache.org/jira/browse/NUTCH-2181 Project:

[jira] [Commented] (NUTCH-2172) Parsing whitespace not just tabs in contenttype-mapping.txt

2015-12-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037470#comment-15037470 ] Lewis John McGibbney commented on NUTCH-2172: - I think that is the point that Seb is making!

[jira] [Commented] (NUTCH-2172) Parsing whitespace not just tabs in contenttype-mapping.txt

2015-12-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037471#comment-15037471 ] Lewis John McGibbney commented on NUTCH-2172: - [~wastl-nagel] this is a good patch. It is good

[jira] [Updated] (NUTCH-2178) DeduplicationJob to optionall group on host or domain

2015-12-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2178: Fix Version/s: (was: 1.11) 1.12 > DeduplicationJob to

[jira] [Updated] (NUTCH-2149) REST endpoint to read Nutch sequence files

2015-12-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2149: Fix Version/s: (was: 1.12) 1.11 > REST endpoint to read

[jira] [Updated] (NUTCH-2128) Refactor configuration end point

2015-12-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2128: Fix Version/s: (was: 1.12) 1.11 > Refactor configuration end

[jira] [Commented] (NUTCH-2172) Parsing whitespace not just tabs in contenttype-mapping.txt

2015-12-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038801#comment-15038801 ] Lewis John McGibbney commented on NUTCH-2172: - +1 > Parsing whitespace not just tabs in

[jira] [Commented] (NUTCH-2158) Upgrade to Tika 1.11

2015-11-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023141#comment-15023141 ] Lewis John McGibbney commented on NUTCH-2158: - I am +1 for this. If we can get this committed

[jira] [Comment Edited] (NUTCH-2158) Upgrade to Tika 1.11

2015-11-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023141#comment-15023141 ] Lewis John McGibbney edited comment on NUTCH-2158 at 11/23/15 9:43 PM:

[jira] [Updated] (NUTCH-2162) Nutch Webapp Crawl fails as it tries to index

2015-11-20 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2162: Fix Version/s: (was: 1.11) 1.12 > Nutch Webapp Crawl fails

[jira] [Resolved] (NUTCH-2058) Indexer plugin that allows RegEx replacements on the NutchDocument field values

2015-11-20 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2058. - Resolution: Fixed Tests are not failing as per recent local builds

[jira] [Commented] (NUTCH-2158) Upgrade to Tika 1.11

2015-11-20 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018544#comment-15018544 ] Lewis John McGibbney commented on NUTCH-2158: - Hi [~jnioche], I reproduce your failing test as

[jira] [Commented] (NUTCH-2069) Ignore external links based on domain

2015-11-19 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15015387#comment-15015387 ] Lewis John McGibbney commented on NUTCH-2069: - +1 for patch. Sorry about formatting folks. We

[jira] [Updated] (NUTCH-2069) Ignore external links based on domain

2015-11-19 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2069: Fix Version/s: 1.12 > Ignore external links based on domain >

[jira] [Updated] (NUTCH-2069) Ignore external links based on domain

2015-11-19 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2069: Fix Version/s: (was: 1.12) 1.11 > Ignore external links

[jira] [Created] (NUTCH-2171) Upgrade Nutch Trunk to Java 1.8

2015-11-16 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2171: --- Summary: Upgrade Nutch Trunk to Java 1.8 Key: NUTCH-2171 URL: https://issues.apache.org/jira/browse/NUTCH-2171 Project: Nutch Issue Type: Task

[jira] [Closed] (NUTCH-2170) When i am crawling the URL http://www.aossama.com/. it is crawling url like this com.aossama.www.http/

2015-11-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-2170. --- Resolution: Fixed Hi prabhakar please go to our mailing lists and we can help

[jira] [Commented] (NUTCH-2157) Parent Issue for Addressing Miredot REST API Warnings

2015-11-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005130#comment-15005130 ] Lewis John McGibbney commented on NUTCH-2157: - +1 commit, this looks much better. The REST

[jira] [Commented] (NUTCH-2130) copyField rawcontent creates error within schema.xml

2015-11-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003551#comment-15003551 ] Lewis John McGibbney commented on NUTCH-2130: - +1 Seb please commit Sir > copyField

[jira] [Updated] (NUTCH-2130) copyField rawcontent creates error within schema.xml

2015-11-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2130: Fix Version/s: (was: 2.4) 2.3.1 > copyField rawcontent

[jira] [Updated] (NUTCH-2160) Upgrade Selenium Java to 2.48.2

2015-11-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2160: Issue Type: Improvement (was: Bug) > Upgrade Selenium Java to 2.48.2 >

[jira] [Closed] (NUTCH-2120) Remove MapWritable from trunk codebase

2015-11-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-2120. --- Committed revision 1714068 > Remove MapWritable from trunk codebase >

[jira] [Resolved] (NUTCH-2160) Upgrade Selenium Java to 2.48.2

2015-11-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2160. - Resolution: Fixed Committed revision 1714071 > Upgrade Selenium Java to 2.48.2 >

[jira] [Resolved] (NUTCH-2120) Remove MapWritable from trunk codebase

2015-11-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2120. - Resolution: Fixed Fix Version/s: (was: 1.12) 1.11 >

[jira] [Commented] (NUTCH-2165) FileDumper Util hard codes part-# folder name

2015-11-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002598#comment-15002598 ] Lewis John McGibbney commented on NUTCH-2165: - +1 [~mjoyce] verified on small sample crawl

[jira] [Comment Edited] (NUTCH-2165) FileDumper Util hard codes part-# folder name

2015-11-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002598#comment-15002598 ] Lewis John McGibbney edited comment on NUTCH-2165 at 11/12/15 6:39 PM:

[jira] [Commented] (NUTCH-2165) FileDumper Util hard codes part-# folder name

2015-11-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000658#comment-15000658 ] Lewis John McGibbney commented on NUTCH-2165: - It means that the remaining data is not dumped.

[jira] [Commented] (NUTCH-2167) Backport TableUtil from 2.x for URL reversing

2015-11-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000912#comment-15000912 ] Lewis John McGibbney commented on NUTCH-2167: - Yes, an example of this being useful is within

[jira] [Updated] (NUTCH-2120) Remove MapWritable from trunk codebase

2015-11-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2120: Issue Type: Task (was: Bug) > Remove MapWritable from trunk codebase >

[jira] [Updated] (NUTCH-2120) Remove MapWritable from trunk codebase

2015-11-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2120: Flags: Patch Patch Info: Patch Available > Remove MapWritable from trunk

[jira] [Commented] (NUTCH-2160) Upgrade Selenium Java to 2.48.2

2015-11-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001105#comment-15001105 ] Lewis John McGibbney commented on NUTCH-2160: - Will commit by EoB today unless there are

[jira] [Updated] (NUTCH-2120) Remove MapWritable from trunk codebase

2015-11-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2120: Attachment: NUTCH-2120.patch Patch which removes this class from Trunk. > Remove

[jira] [Updated] (NUTCH-2163) Utilize current JVM threads to augment URLClassLoader with newly discovered classes

2015-11-06 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2163: Summary: Utilize current JVM threads to augment URLClassLoader with newly

[jira] [Commented] (NUTCH-2162) Nutch Webapp Crawl fails as it tries to index

2015-11-06 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993376#comment-14993376 ] Lewis John McGibbney commented on NUTCH-2162: - In all honesty a work around for this is merely

[jira] [Commented] (NUTCH-2162) Nutch Webapp Crawl fails as it tries to index

2015-11-06 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994168#comment-14994168 ] Lewis John McGibbney commented on NUTCH-2162: - Ack. I also got it working well with Solr and

[jira] [Created] (NUTCH-2161) Interrupted failed and/or killed tasks fail to clean up temp directories in HDFS

2015-11-05 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2161: --- Summary: Interrupted failed and/or killed tasks fail to clean up temp directories in HDFS Key: NUTCH-2161 URL: https://issues.apache.org/jira/browse/NUTCH-2161

[jira] [Updated] (NUTCH-2162) Nutch Webapp Crawl fails as it tries to index

2015-11-05 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2162: Attachment: nutch_webapp.log Example log output from initiating a Crawl from the

[jira] [Created] (NUTCH-2162) Nutch Webapp Crawl fails as it tries to index

2015-11-05 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2162: --- Summary: Nutch Webapp Crawl fails as it tries to index Key: NUTCH-2162 URL: https://issues.apache.org/jira/browse/NUTCH-2162 Project: Nutch

[jira] [Commented] (NUTCH-2160) Upgrade Selenium Java to 2.48.2

2015-11-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991063#comment-14991063 ] Lewis John McGibbney commented on NUTCH-2160: - I was under the impression that Selenium is now

[jira] [Updated] (NUTCH-2129) Track Protocol Status in Crawl Datum

2015-11-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2129: Fix Version/s: (was: 2.4) > Track Protocol Status in Crawl Datum >

[jira] [Commented] (NUTCH-2160) Upgrade Selenium Java to 2.48.2

2015-11-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991031#comment-14991031 ] Lewis John McGibbney commented on NUTCH-2160: - Thanks Kim. I have it working with Firefox

[jira] [Resolved] (NUTCH-2159) Ensure that all WebApp files are copied into generated artifacts for 1.X Webapp

2015-11-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2159. - Resolution: Fixed Committed @revision 1712705 in trunk > Ensure that all WebApp

[jira] [Resolved] (NUTCH-2086) Nutch 1.X Webui

2015-11-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2086. - Resolution: Fixed Fix Version/s: (was: 1.12) 1.11 >

[jira] [Created] (NUTCH-2159) Ensure that all WebApp files are copied into generated artifacts for 1.X Webapp

2015-11-03 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2159: --- Summary: Ensure that all WebApp files are copied into generated artifacts for 1.X Webapp Key: NUTCH-2159 URL: https://issues.apache.org/jira/browse/NUTCH-2159

[jira] [Updated] (NUTCH-2160) Upgrade Selenium Java to 2.48.2

2015-11-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2160: Attachment: NUTCH-2160.patch Patch for trunk. [~kwhitehall] hopefully this will

[jira] [Updated] (NUTCH-2159) Ensure that all WebApp files are copied into generated artifacts for 1.X Webapp

2015-11-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2159: Flags: Patch,Important Patch Info: Patch Available > Ensure that all

[jira] [Updated] (NUTCH-2159) Ensure that all WebApp files are copied into generated artifacts for 1.X Webapp

2015-11-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2159: Attachment: NUTCH-2159.patch Patch fro trunk. This resolves the issue and also

[jira] [Updated] (NUTCH-2159) Ensure that all WebApp files are copied into generated artifacts for 1.X Webapp

2015-11-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2159: Attachment: Screen Shot 2015-11-03 at 10.36.44 PM.png Nice WebApp for us to improve

<    3   4   5   6   7   8   9   10   11   12   >