[jira] [Resolved] (NUTCH-2218) Switch CrawlCompletion arg parsing to Commons CLI

2016-02-18 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce resolved NUTCH-2218. -- Resolution: Fixed [~lewismc], This got merged. I added an example to the option you raised as

[jira] [Commented] (NUTCH-2218) Switch CrawlCompletion arg parsing to Commons CLI

2016-02-18 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152731#comment-15152731 ] Michael Joyce commented on NUTCH-2218: -- Sorry for any confusion here folks. Changes were merged in

[jira] [Updated] (NUTCH-2218) Switch CrawlCompletion arg parsing to Commons CLI

2016-02-12 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce updated NUTCH-2218: - Issue Type: Improvement (was: Bug) > Switch CrawlCompletion arg parsing to Commons CLI >

[jira] [Created] (NUTCH-2187) Change FileDumper SHAs to all uppercase

2015-12-16 Thread Michael Joyce (JIRA)
Michael Joyce created NUTCH-2187: Summary: Change FileDumper SHAs to all uppercase Key: NUTCH-2187 URL: https://issues.apache.org/jira/browse/NUTCH-2187 Project: Nutch Issue Type:

[jira] [Resolved] (NUTCH-2187) Change FileDumper SHAs to all uppercase

2015-12-16 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce resolved NUTCH-2187. -- Resolution: Duplicate Going to just resolve this in NUTCH-2182. Thought that patch had already

[jira] [Resolved] (NUTCH-2182) Make reverseUrlDirs file dumper option hash the URL for consistency

2015-12-16 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce resolved NUTCH-2182. -- Resolution: Fixed Resolved in r1720466 > Make reverseUrlDirs file dumper option hash the URL

[jira] [Commented] (NUTCH-2180) FileDumper dumps data, but breaks midway on corrupt segments

2015-12-09 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048889#comment-15048889 ] Michael Joyce commented on NUTCH-2180: -- Thanks for the patch [~hmanjuna], will scope shortly >

[jira] [Assigned] (NUTCH-2180) FileDumper dumps data, but breaks midway on corrupt segments

2015-12-09 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce reassigned NUTCH-2180: Assignee: Michael Joyce > FileDumper dumps data, but breaks midway on corrupt segments >

[jira] [Updated] (NUTCH-2182) Make reverseUrlDirs file dumper option hash the URL for consistency

2015-12-08 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce updated NUTCH-2182: - Attachment: NUTCH-2182_joyce_8Dec2015.patch Patch Attached > Make reverseUrlDirs file dumper

[jira] [Created] (NUTCH-2182) Make reverseUrlDirs file dumper option hash the URL for consistency

2015-12-08 Thread Michael Joyce (JIRA)
Michael Joyce created NUTCH-2182: Summary: Make reverseUrlDirs file dumper option hash the URL for consistency Key: NUTCH-2182 URL: https://issues.apache.org/jira/browse/NUTCH-2182 Project: Nutch

[jira] [Commented] (NUTCH-2158) Upgrade to Tika 1.11

2015-11-23 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023388#comment-15023388 ] Michael Joyce commented on NUTCH-2158: -- +1 on this. Looks good to me > Upgrade to Tika 1.11 >

[jira] [Created] (NUTCH-2173) String.join in FileDumper breaks the build

2015-11-18 Thread Michael Joyce (JIRA)
Michael Joyce created NUTCH-2173: Summary: String.join in FileDumper breaks the build Key: NUTCH-2173 URL: https://issues.apache.org/jira/browse/NUTCH-2173 Project: Nutch Issue Type: Bug

[jira] [Work started] (NUTCH-2173) String.join in FileDumper breaks the build

2015-11-18 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2173 started by Michael Joyce. > String.join in FileDumper breaks the build >

[jira] [Resolved] (NUTCH-2173) String.join in FileDumper breaks the build

2015-11-18 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce resolved NUTCH-2173. -- Resolution: Fixed Resolve in r1715046 > String.join in FileDumper breaks the build >

[jira] [Resolved] (NUTCH-2166) Add reverse URL format to dump tool

2015-11-17 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce resolved NUTCH-2166. -- Resolution: Fixed Committed in r1714908 > Add reverse URL format to dump tool >

[jira] [Commented] (NUTCH-2166) Add reverse URL format to dump tool

2015-11-13 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004191#comment-15004191 ] Michael Joyce commented on NUTCH-2166: -- Output from a small example run. I don't know that I'm

[jira] [Commented] (NUTCH-2166) Add reverse URL format to dump tool

2015-11-12 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002328#comment-15002328 ] Michael Joyce commented on NUTCH-2166: -- Small change in dump format. Instead of making a bajillion

[jira] [Resolved] (NUTCH-2167) Backport TableUtil from 2.x for URL reversing

2015-11-12 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce resolved NUTCH-2167. -- Resolution: Fixed TableUtil copied over in r1714078 and tests copied over in 1714079 >

[jira] [Commented] (NUTCH-2165) FileDumper Util hard codes part-# folder name

2015-11-12 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002604#comment-15002604 ] Michael Joyce commented on NUTCH-2165: -- Thanks [~lewismc], I'll merge shortly > FileDumper Util hard

[jira] [Resolved] (NUTCH-2165) FileDumper Util hard codes part-# folder name

2015-11-12 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce resolved NUTCH-2165. -- Resolution: Fixed Committed in r1714104 > FileDumper Util hard codes part-# folder name >

[jira] [Resolved] (NUTCH-2155) Create a "crawl completeness" utility

2015-11-11 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce resolved NUTCH-2155. -- Resolution: Fixed Latest patch committed in r1713885 > Create a "crawl completeness" utility >

[jira] [Resolved] (NUTCH-2150) Add ProtocolStatus Utility

2015-11-11 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce resolved NUTCH-2150. -- Resolution: Fixed Resolved in r1713892 > Add ProtocolStatus Utility >

[jira] [Commented] (NUTCH-2167) Backport TableUtil from 2.x for URL reversing

2015-11-11 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000841#comment-15000841 ] Michael Joyce commented on NUTCH-2167: -- Hi folks, All looks good and tests run fine after moving

[jira] [Work started] (NUTCH-2167) Backport TableUtil from 2.x for URL reversing

2015-11-11 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2167 started by Michael Joyce. > Backport TableUtil from 2.x for URL reversing >

[jira] [Work started] (NUTCH-1911) Improve DomainStatistics tool command line parsing

2015-11-11 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1911 started by Michael Joyce. > Improve DomainStatistics tool command line parsing >

[jira] [Resolved] (NUTCH-1911) Improve DomainStatistics tool command line parsing

2015-11-11 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce resolved NUTCH-1911. -- Resolution: Fixed Resolved in r1713890 > Improve DomainStatistics tool command line parsing >

[jira] [Work started] (NUTCH-2150) Add ProtocolStatus Utility

2015-11-11 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2150 started by Michael Joyce. > Add ProtocolStatus Utility > -- > >

[jira] [Work started] (NUTCH-2155) Create a "crawl completeness" utility

2015-11-11 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2155 started by Michael Joyce. > Create a "crawl completeness" utility > -

[jira] [Created] (NUTCH-2166) Add reverse URL format to dump tool

2015-11-11 Thread Michael Joyce (JIRA)
Michael Joyce created NUTCH-2166: Summary: Add reverse URL format to dump tool Key: NUTCH-2166 URL: https://issues.apache.org/jira/browse/NUTCH-2166 Project: Nutch Issue Type: Improvement

[jira] [Work started] (NUTCH-2166) Add reverse URL format to dump tool

2015-11-11 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2166 started by Michael Joyce. > Add reverse URL format to dump tool > --- > >

[jira] [Created] (NUTCH-2165) FileDumper Util hard codes part-# folder name

2015-11-11 Thread Michael Joyce (JIRA)
Michael Joyce created NUTCH-2165: Summary: FileDumper Util hard codes part-# folder name Key: NUTCH-2165 URL: https://issues.apache.org/jira/browse/NUTCH-2165 Project: Nutch Issue Type: Bug

[jira] [Created] (NUTCH-2167) Backport TableUtil from 2.x for URL reversing

2015-11-11 Thread Michael Joyce (JIRA)
Michael Joyce created NUTCH-2167: Summary: Backport TableUtil from 2.x for URL reversing Key: NUTCH-2167 URL: https://issues.apache.org/jira/browse/NUTCH-2167 Project: Nutch Issue Type:

[jira] [Work started] (NUTCH-2165) FileDumper Util hard codes part-# folder name

2015-11-11 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2165 started by Michael Joyce. > FileDumper Util hard codes part-# folder name >

[jira] [Assigned] (NUTCH-2165) FileDumper Util hard codes part-# folder name

2015-11-11 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce reassigned NUTCH-2165: Assignee: Michael Joyce > FileDumper Util hard codes part-# folder name >

[jira] [Commented] (NUTCH-2165) FileDumper Util hard codes part-# folder name

2015-11-11 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000910#comment-15000910 ] Michael Joyce commented on NUTCH-2165: -- Oh aye > FileDumper Util hard codes part-# folder name >

[jira] [Updated] (NUTCH-2165) FileDumper Util hard codes part-# folder name

2015-11-11 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce updated NUTCH-2165: - Attachment: NUTCH-2165_joyce_11Nov2015.patch Patch attached > FileDumper Util hard codes part-#

[jira] [Commented] (NUTCH-2165) FileDumper Util hard codes part-# folder name

2015-11-11 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000923#comment-15000923 ] Michael Joyce commented on NUTCH-2165: -- Note, the diff looks massive here. This is really just adding

[jira] [Assigned] (NUTCH-2150) Add ProtocolStatus Utility

2015-11-09 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce reassigned NUTCH-2150: Assignee: Michael Joyce (was: Chris A. Mattmann) > Add ProtocolStatus Utility >

[jira] [Assigned] (NUTCH-1911) Improve DomainStatistics tool command line parsing

2015-11-09 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce reassigned NUTCH-1911: Assignee: Michael Joyce (was: Chris A. Mattmann) > Improve DomainStatistics tool command

[jira] [Updated] (NUTCH-1911) Improve DomainStatistics tool command line parsing

2015-11-09 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce updated NUTCH-1911: - Summary: Improve DomainStatistics tool command line parsing (was: Imeprove DomainStatistics tool

[jira] [Assigned] (NUTCH-2155) Create a "crawl completeness" utility

2015-11-09 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce reassigned NUTCH-2155: Assignee: Michael Joyce (was: Chris A. Mattmann) > Create a "crawl completeness" utility

[jira] [Updated] (NUTCH-1911) Improve DomainStatistics tool command line parsing

2015-11-09 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce updated NUTCH-1911: - Fix Version/s: 1.10 > Improve DomainStatistics tool command line parsing >

[jira] [Updated] (NUTCH-2150) Add ProtocolStatus Utility

2015-11-09 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce updated NUTCH-2150: - Attachment: NUTCH-2015_joyce_9Nov2015.patch Patch attached to clean up help formatting and drop

[jira] [Updated] (NUTCH-2155) Create a "crawl completeness" utility

2015-11-09 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce updated NUTCH-2155: - Attachment: NUTCH-2155_joyce_9Nov2015.patch Patch attached to address "current" requirements in

[jira] [Updated] (NUTCH-1911) Improve DomainStatistics tool command line parsing

2015-11-09 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce updated NUTCH-1911: - Attachment: NUTCH-1911_joyce_9Nov2015.patch Attach more recent patch to include removal of

[jira] [Updated] (NUTCH-1911) Improve DomainStatistics tool command line parsing

2015-11-09 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce updated NUTCH-1911: - Fix Version/s: (was: 1.10) 1.11 > Improve DomainStatistics tool command

[jira] [Updated] (NUTCH-1911) Improve DomainStatistics tool command line parsing

2015-11-09 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce updated NUTCH-1911: - Attachment: NUTCH-1911_joyce_9Nov2015.patch Going to resubmit the attached patch to get these

[jira] [Commented] (NUTCH-2155) Create a "crawl completeness" utility

2015-11-02 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985431#comment-14985431 ] Michael Joyce commented on NUTCH-2155: -- +1 sounds good to me [~sebastien0], I will update it in a

[jira] [Commented] (NUTCH-2150) Add ProtocolStatus Utility

2015-11-02 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985427#comment-14985427 ] Michael Joyce commented on NUTCH-2150: -- Yes, will address in a patch shortly. > Add ProtocolStatus

[jira] [Commented] (NUTCH-1911) Imeprove DomainStatistics tool command line parsing

2015-11-02 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985436#comment-14985436 ] Michael Joyce commented on NUTCH-1911: -- Hrm odd, I want to throw some commons-cli at a few of the

[jira] [Created] (NUTCH-2155) Create a "crawl completeness" utility

2015-10-28 Thread Michael Joyce (JIRA)
Michael Joyce created NUTCH-2155: Summary: Create a "crawl completeness" utility Key: NUTCH-2155 URL: https://issues.apache.org/jira/browse/NUTCH-2155 Project: Nutch Issue Type: Improvement

[jira] [Commented] (NUTCH-2155) Create a "crawl completeness" utility

2015-10-28 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979196#comment-14979196 ] Michael Joyce commented on NUTCH-2155: -- Should have a first patch up shortly for review folks >

[jira] [Created] (NUTCH-2150) Add ProtocolStatus Utility

2015-10-27 Thread Michael Joyce (JIRA)
Michael Joyce created NUTCH-2150: Summary: Add ProtocolStatus Utility Key: NUTCH-2150 URL: https://issues.apache.org/jira/browse/NUTCH-2150 Project: Nutch Issue Type: Improvement

[jira] [Commented] (NUTCH-2150) Add ProtocolStatus Utility

2015-10-27 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977036#comment-14977036 ] Michael Joyce commented on NUTCH-2150: -- Hi folks, PR is up for this. You can run the util with

[jira] [Commented] (NUTCH-2141) Change the InteractiveSelenium plugin handler Interface to return page content

2015-10-15 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959659#comment-14959659 ] Michael Joyce commented on NUTCH-2141: -- Cool makes sense. Do you have any examples? I'd like to poke

[jira] [Commented] (NUTCH-2141) Change the InteractiveSelenium plugin handler Interface to return page content

2015-10-15 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959345#comment-14959345 ] Michael Joyce commented on NUTCH-2141: -- This was actually brought up in NUTCH-2108. There's also an

[jira] [Commented] (NUTCH-2129) Track Protocol Status in Crawl Datum

2015-10-07 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947002#comment-14947002 ] Michael Joyce commented on NUTCH-2129: -- Fixed the unnecessary init that [~jnioche] caught. Thanks

[jira] [Created] (NUTCH-2133) Transfer Selenium Documentation to WIki

2015-10-06 Thread Michael Joyce (JIRA)
Michael Joyce created NUTCH-2133: Summary: Transfer Selenium Documentation to WIki Key: NUTCH-2133 URL: https://issues.apache.org/jira/browse/NUTCH-2133 Project: Nutch Issue Type:

[jira] [Commented] (NUTCH-2129) Track Protocol Status in Crawl Datum

2015-10-06 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945766#comment-14945766 ] Michael Joyce commented on NUTCH-2129: -- Hey folks, updated PR with the metadata approach for HTTP and

[jira] [Commented] (NUTCH-2129) Track Protocol Status in Crawl Datum

2015-10-01 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939939#comment-14939939 ] Michael Joyce commented on NUTCH-2129: -- Thanks Julien. I figured there would probably be a few

[jira] [Commented] (NUTCH-2108) Add a function to the selenium interactive plugin interface to do multiple manipulation of driver and then return the data

2015-10-01 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940036#comment-14940036 ] Michael Joyce commented on NUTCH-2108: -- Good stuff [~asitang], glad to see the workaround proved

[jira] [Created] (NUTCH-2129) Track Protocol Status in Crawl Datum

2015-09-30 Thread Michael Joyce (JIRA)
Michael Joyce created NUTCH-2129: Summary: Track Protocol Status in Crawl Datum Key: NUTCH-2129 URL: https://issues.apache.org/jira/browse/NUTCH-2129 Project: Nutch Issue Type: Improvement

[jira] [Commented] (NUTCH-2129) Track Protocol Status in Crawl Datum

2015-09-30 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939124#comment-14939124 ] Michael Joyce commented on NUTCH-2129: -- Hi folks, Initial pull request up to address this. Note that

[jira] [Created] (NUTCH-2115) Add total counts to dump stats

2015-09-23 Thread Michael Joyce (JIRA)
Michael Joyce created NUTCH-2115: Summary: Add total counts to dump stats Key: NUTCH-2115 URL: https://issues.apache.org/jira/browse/NUTCH-2115 Project: Nutch Issue Type: Improvement

[jira] [Commented] (NUTCH-2115) Add total counts to dump stats

2015-09-23 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905156#comment-14905156 ] Michael Joyce commented on NUTCH-2115: -- Cheers [~lewismc], thanks for the quick merge! > Add total

[jira] [Commented] (NUTCH-2077) Upgrade to Tika 1.10

2015-08-28 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720279#comment-14720279 ] Michael Joyce commented on NUTCH-2077: -- Hey folks, updated tika to 1.10. If there was

[jira] [Created] (NUTCH-2088) Add Optional Execution to Interactive Selenium Handlers

2015-08-28 Thread Michael Joyce (JIRA)
Michael Joyce created NUTCH-2088: Summary: Add Optional Execution to Interactive Selenium Handlers Key: NUTCH-2088 URL: https://issues.apache.org/jira/browse/NUTCH-2088 Project: Nutch Issue

[jira] [Commented] (NUTCH-2082) Upgrade to Apache Tika 1.10

2015-08-19 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703153#comment-14703153 ] Michael Joyce commented on NUTCH-2082: -- FYI, this is a duplicate of NUTCH-2077 I

[jira] [Commented] (NUTCH-2049) Upgrade Trunk to Hadoop 2.4 stable

2015-08-18 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701707#comment-14701707 ] Michael Joyce commented on NUTCH-2049: -- Great stuff Lewis. Builds and runs cleanly

[jira] [Commented] (NUTCH-2049) Upgrade Trunk to Hadoop 2.4 stable

2015-08-12 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694210#comment-14694210 ] Michael Joyce commented on NUTCH-2049: -- Hey [~lewismc], Tried your patch here. Seems

[jira] [Commented] (NUTCH-2062) Add Plugin for interacting with Selenium WebDriver

2015-07-29 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646423#comment-14646423 ] Michael Joyce commented on NUTCH-2062: -- Hi folks, Is there something I need to do to

[jira] [Comment Edited] (NUTCH-2062) Add Plugin for interacting with Selenium WebDriver

2015-07-29 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646488#comment-14646488 ] Michael Joyce edited comment on NUTCH-2062 at 7/29/15 5:50 PM:

[jira] [Commented] (NUTCH-2062) Add Plugin for interacting with Selenium WebDriver

2015-07-29 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646488#comment-14646488 ] Michael Joyce commented on NUTCH-2062: -- Cheers Chris, responded on the PR. Also,

[jira] [Updated] (NUTCH-2048) parse-tika: fix dependencies in plugin.xml

2015-07-27 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce updated NUTCH-2048: - Attachment: NUTCH-2048_Joyce_20150727.patch Updated the patch to set the sync attribute on

[jira] [Commented] (NUTCH-1936) GSoC 2015 - Move Nutch to Hadoop 2.X

2015-07-24 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641210#comment-14641210 ] Michael Joyce commented on NUTCH-1936: -- Ah this is absolutely awesome Lewis. Great

[jira] [Commented] (NUTCH-2048) parse-tika: fix dependencies in plugin.xml

2015-07-23 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639462#comment-14639462 ] Michael Joyce commented on NUTCH-2048: -- Alright, hopefully this one is a bit more on

[jira] [Updated] (NUTCH-2048) parse-tika: fix dependencies in plugin.xml

2015-07-23 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce updated NUTCH-2048: - Attachment: NUTCH-2048_Joyce_20150723_2.patch Patch #2 up. Explanation to follow shortly

[jira] [Commented] (NUTCH-2048) parse-tika: fix dependencies in plugin.xml

2015-07-23 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639396#comment-14639396 ] Michael Joyce commented on NUTCH-2048: -- Ah I clearly didn't pay enough attention to

[jira] [Updated] (NUTCH-2048) parse-tika: fix dependencies in plugin.xml

2015-07-23 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce updated NUTCH-2048: - Attachment: NUTCH-2048_Joyce_20150723.patch Quick patch up for this. parse-tika: fix

[jira] [Updated] (NUTCH-2063) Add -mimeStats flag to FileDumper tool

2015-07-22 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce updated NUTCH-2063: - Labels: memex (was: ) Add -mimeStats flag to FileDumper tool

[jira] [Updated] (NUTCH-2004) ParseChecker does not handle redirects

2015-07-22 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce updated NUTCH-2004: - Labels: memex (was: ) ParseChecker does not handle redirects

[jira] [Commented] (NUTCH-2062) Add Plugin for interacting with Selenium WebDriver

2015-07-22 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636958#comment-14636958 ] Michael Joyce commented on NUTCH-2062: -- Cheers [~lewismc], let me see what I can do

[jira] [Commented] (NUTCH-2062) Add Plugin for interacting with Selenium WebDriver

2015-07-21 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635389#comment-14635389 ] Michael Joyce commented on NUTCH-2062: -- Hi folks, Just wanted to elaborate a bit on

[jira] [Commented] (NUTCH-2063) Add -mimeStats flag to FileDumper tool

2015-07-21 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635706#comment-14635706 ] Michael Joyce commented on NUTCH-2063: -- Hey [~lewismc], threw a patch up for this.

[jira] [Updated] (NUTCH-2063) Add -mimeStats flag to FileDumper tool

2015-07-21 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce updated NUTCH-2063: - Attachment: nutch-2063-joyce-21July2015.patch Add -mimeStats flag to FileDumper tool

[jira] [Created] (NUTCH-2062) Add Plugin for interacting with Selenium WebDriver

2015-07-20 Thread Michael Joyce (JIRA)
Michael Joyce created NUTCH-2062: Summary: Add Plugin for interacting with Selenium WebDriver Key: NUTCH-2062 URL: https://issues.apache.org/jira/browse/NUTCH-2062 Project: Nutch Issue Type:

[jira] [Commented] (NUTCH-2062) Add Plugin for interacting with Selenium WebDriver

2015-07-20 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633731#comment-14633731 ] Michael Joyce commented on NUTCH-2062: -- Hi folks, I have a work-in progress locally

[jira] [Commented] (NUTCH-1504) Pluggable url partitioner

2015-06-24 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599958#comment-14599958 ] Michael Joyce commented on NUTCH-1504: -- This is great stuff [~lewismc], we definitely

[jira] [Commented] (NUTCH-2045) index-basic incorrect assignment of next fetch time (page.getFetchTime()) as page fetch time

2015-06-22 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596832#comment-14596832 ] Michael Joyce commented on NUTCH-2045: -- +1 this is great index-basic incorrect

[jira] [Created] (NUTCH-2004) ParseChecker does not handle redirects

2015-04-29 Thread Michael Joyce (JIRA)
Michael Joyce created NUTCH-2004: Summary: ParseChecker does not handle redirects Key: NUTCH-2004 URL: https://issues.apache.org/jira/browse/NUTCH-2004 Project: Nutch Issue Type: Improvement

[jira] [Commented] (NUTCH-2004) ParseChecker does not handle redirects

2015-04-29 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520028#comment-14520028 ] Michael Joyce commented on NUTCH-2004: -- Hi folks, will try to get a patch thrown up

[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk

2015-04-20 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503746#comment-14503746 ] Michael Joyce commented on NUTCH-1934: -- Hey [~lewismc], Patch applied clean to

[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk

2015-04-20 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503727#comment-14503727 ] Michael Joyce commented on NUTCH-1934: -- Once sec Lewis and I'll take a quick scope.

[jira] [Commented] (NUTCH-1987) Make bin/crawl indexer agnostic

2015-04-20 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503446#comment-14503446 ] Michael Joyce commented on NUTCH-1987: -- Hi folks, PR has been updated with the

[jira] [Commented] (NUTCH-1987) Make bin/crawl indexer agnostic

2015-04-18 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14501674#comment-14501674 ] Michael Joyce commented on NUTCH-1987: -- Hey Chris, Will do. I'll try to take a poke

[jira] [Updated] (NUTCH-1986) Clarify Elastic Search Indexer Plugin Settings

2015-04-16 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce updated NUTCH-1986: - Labels: memex (was: ) Clarify Elastic Search Indexer Plugin Settings

[jira] [Commented] (NUTCH-1911) Imeprove DomainStatistics tool command line parsing

2015-04-16 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498689#comment-14498689 ] Michael Joyce commented on NUTCH-1911: -- Hey folks, Here's what the output from this

[jira] [Updated] (NUTCH-1988) Make nested output directory dump optional

2015-04-16 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce updated NUTCH-1988: - Labels: memex (was: ) Make nested output directory dump optional

[jira] [Commented] (NUTCH-1906) Typo in CrawlDbReader command line help

2015-04-16 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498573#comment-14498573 ] Michael Joyce commented on NUTCH-1906: -- Hi folks, I'll throw a patch up shortly for

[jira] [Updated] (NUTCH-1987) Make bin/crawl indexer agnostic

2015-04-16 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce updated NUTCH-1987: - Labels: memex (was: ) Make bin/crawl indexer agnostic ---

  1   2   >