[jira] [Commented] (NUTCH-2122) Implement Javadoc package.html for webui packages

2016-05-19 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292454#comment-15292454 ] Lewis John McGibbney commented on NUTCH-2122: - I agree :) > Implement Javadoc package.h

Re: Breaking Change Note in CHANGES.txt

2016-05-19 Thread Lewis John Mcgibbney
Thanks Markus I appreciate the response. I'll push the release candidate now. On Tue, May 17, 2016 at 9:46 AM, Lewis John Mcgibbney < lewis.mcgibb...@gmail.com> wrote: > Hi Folks, > What is going on with the note in CHANGES.txt? [0] I've pasted it below > for convenience. > Did

[jira] [Commented] (NUTCH-1858) Migrate Nutch documentation from Moin Moin to Confluence

2016-05-19 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291595#comment-15291595 ] Lewis John McGibbney commented on NUTCH-1858: - AFAIK a script or two exist to do

[jira] [Commented] (NUTCH-1858) Migrate Nutch documentation from Moin Moin to Confluence

2016-05-19 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291400#comment-15291400 ] Lewis John McGibbney commented on NUTCH-1858: - I honestly do no know. This is a huge amount

[jira] [Commented] (NUTCH-2122) Implement Javadoc package.html for service packages

2016-05-19 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291395#comment-15291395 ] Lewis John McGibbney commented on NUTCH-2122: - Hi Furkan, please change the 'service

[jira] [Commented] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-05-17 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287572#comment-15287572 ] Lewis John McGibbney commented on NUTCH-: - [~kamaci] can you please check this issue out

[jira] [Updated] (NUTCH-2112) Missing org.restlet.jee when building with gora-solr

2016-05-17 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2112: Fix Version/s: 2.4 > Missing org.restlet.jee when building with gora-s

[jira] [Reopened] (NUTCH-2248) CSS parser plugin

2016-05-17 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reopened NUTCH-2248: - > CSS parser plugin > - > > Key

Proposal to Use Jira Release Notes within CHANGES.txt

2016-05-17 Thread Lewis John Mcgibbney
Hi Folks, I want to poll dev@ to see if we could make more accurate release notes by using the Jira ones https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=10680=1228 What do you think? Pro's makes explicit categories for improvements, bug fixes, etc. Con's takes away the

Breaking Change Note in CHANGES.txt

2016-05-17 Thread Lewis John Mcgibbney
Hi Folks, What is going on with the note in CHANGES.txt? [0] I've pasted it below for convenience. Did I miss some convo on dev@ which states there were breaking changes which were being committee to master? Thanks whoever can fill me in on this one. I am not going to progress with the Nutch 1.12

[jira] [Updated] (NUTCH-2251) Make CommonCrawlFormatJackson instance reusable by properly handling object state

2016-05-17 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2251: Fix Version/s: (was: 1.10) 1.13 > M

[jira] [Resolved] (NUTCH-2260) JAVA_HOME and hbase-common dependency absent from hbase Docker image

2016-05-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2260. - Resolution: Fixed > JAVA_HOME and hbase-common dependency absent from hb

[jira] [Resolved] (NUTCH-2259) Nutch 2.x HBase Docker requires a logs folder to run exception free

2016-05-14 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2259. - Resolution: Fixed > Nutch 2.x HBase Docker requires a logs folder to

[jira] [Commented] (NUTCH-2259) Nutch 2.x HBase Docker requires a logs folder to run exception free

2016-05-14 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283643#comment-15283643 ] Lewis John McGibbney commented on NUTCH-2259: - PR is available at https://github.com/apache

[jira] [Created] (NUTCH-2259) Nutch 2.x HBase Docker requires a logs folder to run exception free

2016-05-14 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2259: --- Summary: Nutch 2.x HBase Docker requires a logs folder to run exception free Key: NUTCH-2259 URL: https://issues.apache.org/jira/browse/NUTCH-2259

[jira] [Commented] (NUTCH-2258) Provide Javadoc for ScriptInput/OutputFormat's

2016-05-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281054#comment-15281054 ] Lewis John McGibbney commented on NUTCH-2258: - I accidentally created this issue for NUTCH

[jira] [Created] (NUTCH-2258) Provide Javadoc for ScriptInput/OutputFormat's

2016-05-11 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2258: --- Summary: Provide Javadoc for ScriptInput/OutputFormat's Key: NUTCH-2258 URL: https://issues.apache.org/jira/browse/NUTCH-2258 Project: Nutch

[DISCUSS] Release Nutch 1.12

2016-05-09 Thread Lewis John Mcgibbney
Hi Folks, Any objections to me cutting an RC for 1.12 this week? Thanks Lewis -- *Lewis*

[jira] [Resolved] (NUTCH-2252) Allow phantomjs as a browser for selenium options

2016-05-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2252. - Resolution: Fixed Thanks [~kwhitehall] and folks. > Allow phanto

[jira] [Commented] (NUTCH-1824) protocol-http using proxy not working with https sites

2016-05-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276727#comment-15276727 ] Lewis John McGibbney commented on NUTCH-1824: - Hi [~xjtujiyong] I'll try and scope later

Re: GSoC 2016: You are a mentor for Furkan KAMACI

2016-04-22 Thread Lewis John Mcgibbney
Congratulations Furkan On Friday, April 22, 2016, Google Summer of Code < summerofcode-nore...@google.com> wrote: > [image: Google Summer of Code] > > Congratulations, you will be spending the next few months working with > Furkan KAMACI on Security Layer for NutchServer (NUTCH-1756) >

[jira] [Updated] (NUTCH-2252) Allow phantomjs as a browser for selenium options

2016-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2252: Fix Version/s: 1.12 > Allow phantomjs as a browser for selenium opti

[jira] [Updated] (NUTCH-2252) Allow phantomjs as a browser for selenium options

2016-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2252: Affects Version/s: 1.12 > Allow phantomjs as a browser for selenium opti

[jira] [Assigned] (NUTCH-2252) Allow phantomjs as a browser for selenium options

2016-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2252: --- Assignee: Lewis John McGibbney > Allow phantomjs as a browser for selen

[jira] [Updated] (NUTCH-2188) While crawling with solr url (kerberos enabled) Error: org.apache.solr.common.SolrException: Unauthorized

2016-04-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2188: Fix Version/s: (was: 1.9) 1.12 > While crawling with s

[jira] [Updated] (NUTCH-2217) Crawl pages with specified language

2016-04-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2217: Fix Version/s: (was: 1.11) 2.5 > Crawl pages with specif

[jira] [Resolved] (NUTCH-2238) Indexer for Elasticsearch 2.x

2016-04-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2238. - Resolution: Fixed Thank you [~ptorrestr] > Indexer for Elasticsearch

[jira] [Updated] (NUTCH-1741) Support of Sitemaps in Nutch 2.x

2016-04-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1741: Fix Version/s: (was: 2.3.2) 2.4 > Support of Sitem

[jira] [Updated] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-04-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-: Fix Version/s: (was: 2.3.2) 2.4 > re-fetch deletes

[jira] [Updated] (NUTCH-2238) Indexer for Elasticsearch 2.x

2016-04-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2238: Assignee: Pablo Torres > Indexer for Elasticsearch

[jira] [Updated] (NUTCH-2238) Indexer for Elasticsearch 2.x

2016-04-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2238: Fix Version/s: (was: 2.3.2) > Indexer for Elasticsearch

[jira] [Commented] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-04-07 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230393#comment-15230393 ] Lewis John McGibbney commented on NUTCH-: - I've committed and closed the MemStore

[jira] [Commented] (NUTCH-2191) Add protocol-htmlunit

2016-03-27 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213527#comment-15213527 ] Lewis John McGibbney commented on NUTCH-2191: - Thanks [~karanjeets] good job > Add proto

[jira] [Commented] (NUTCH-2191) Add protocol-htmlunit

2016-03-27 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213390#comment-15213390 ] Lewis John McGibbney commented on NUTCH-2191: - [~karanjeets] if you can please step through

[jira] [Commented] (NUTCH-2089) Move Nutch to compile on JDK 8

2016-03-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212656#comment-15212656 ] Lewis John McGibbney commented on NUTCH-2089: - Compiler warnings, etc. Feel free to analyze

[jira] [Commented] (NUTCH-2005) Implement HTrace'ing in Nutch

2016-03-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210241#comment-15210241 ] Lewis John McGibbney commented on NUTCH-2005: - [~mefaraz...@gmail.com], please check out

[jira] [Updated] (NUTCH-2005) Implement HTrace'ing in Nutch

2016-03-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2005: Description: Recent developments within the tracing community have brought projects

[jira] [Updated] (NUTCH-2005) Implement HTrace'ing in Nutch

2016-03-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2005: Fix Version/s: 2.4 > Implement HTrace'ing in Nu

[jira] [Updated] (NUTCH-2005) Implement HTrace'ing in Nutch

2016-03-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2005: Description: Recent developments within the tracing community have brought projects

[jira] [Updated] (NUTCH-1756) Security layer for NutchServer

2016-03-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1756: Description: It will be beneficial to have a security layer for NutchServer once we

[jira] [Updated] (NUTCH-1756) Security layer for NutchServer

2016-03-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1756: Labels: gsoc2016 (was: ) > Security layer for NutchSer

[jira] [Commented] (NUTCH-2005) Implement HTrace'ing in Nutch

2016-03-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209411#comment-15209411 ] Lewis John McGibbney commented on NUTCH-2005: - [~mefaraz...@gmail.com] are you still

[jira] [Commented] (NUTCH-1492) Support gora-dynamodb in Nutch 2.x

2016-03-19 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198869#comment-15198869 ] Lewis John McGibbney commented on NUTCH-1492: - [~renato2099] what about this shit? > Supp

[jira] [Updated] (NUTCH-2185) protocol-soda-consumer plugin

2016-03-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2185: Description: I'm finishing off a Nutch protocol implementation for interacting

[jira] [Commented] (NUTCH-2202) Integration of Anthelion (Focused Crawling Module) into Nutch

2016-03-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15187238#comment-15187238 ] Lewis John McGibbney commented on NUTCH-2202: - [~robertmeusel] please don't look into it yet

[jira] [Assigned] (NUTCH-2202) Integration of Anthelion (Focused Crawling Module) into Nutch

2016-03-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2202: --- Assignee: Lewis John McGibbney > Integration of Anthelion (Focused Crawl

[jira] [Updated] (NUTCH-2202) Integration of Anthelion (Focused Crawling Module) into Nutch

2016-03-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2202: Description: We have recently released anthelion, which is a focused crawler plugin

[jira] [Commented] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-03-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178568#comment-15178568 ] Lewis John McGibbney commented on NUTCH-: - Nice [~abenjell], In Nutch we use [MemStore

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2016-03-02 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176122#comment-15176122 ] Lewis John McGibbney commented on NUTCH-2184: - sh*t, I didn't push up my assertions. I'll get

[jira] [Comment Edited] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-03-02 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176048#comment-15176048 ] Lewis John McGibbney edited comment on NUTCH- at 3/2/16 5:34 PM

[jira] [Commented] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-03-02 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176048#comment-15176048 ] Lewis John McGibbney commented on NUTCH-: - I don't think there is any workaround

[jira] [Commented] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-02-28 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171487#comment-15171487 ] Lewis John McGibbney commented on NUTCH-: - Hi, I can replicate this on hbase-0.98.8

[jira] [Updated] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-02-28 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-: Fix Version/s: 2.3.2 > re-fetch deletes all metadata except _csh_ and _

[jira] [Updated] (NUTCH-1741) Support of Sitemaps in Nutch 2.x

2016-02-28 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1741: Fix Version/s: (was: 2.4) 2.3.2 > Support of Sitem

[jira] [Commented] (NUTCH-2234) Upgrade to elasticsearch 2.1.1

2016-02-26 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169521#comment-15169521 ] Lewis John McGibbney commented on NUTCH-2234: - Out or curiosity. What versions of httpcore

[jira] [Commented] (NUTCH-2235) Classpath discrepancy with protocol-selenium in deploy mode

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168253#comment-15168253 ] Lewis John McGibbney commented on NUTCH-2235: - The source of this issue is ordering of Nutch

[jira] [Comment Edited] (NUTCH-2235) Classpath discrepancy with protocol-selenium in deploy mode

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168253#comment-15168253 ] Lewis John McGibbney edited comment on NUTCH-2235 at 2/26/16 1:38 AM

[jira] [Commented] (NUTCH-2235) Classpath discrepancy with protocol-selenium in deploy mode

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168156#comment-15168156 ] Lewis John McGibbney commented on NUTCH-2235: - Looks like the issue is with httpcore instead

[jira] [Commented] (NUTCH-2235) Classpath discrepancy with protocol-selenium in deploy mode

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168097#comment-15168097 ] Lewis John McGibbney commented on NUTCH-2235: - {code} jar tf apache-nutch-1.12-SNAPSHOT.job

[jira] [Commented] (NUTCH-2235) Classpath discrepancy with protocol-selenium in deploy mode

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168095#comment-15168095 ] Lewis John McGibbney commented on NUTCH-2235: - This issue is commonly associated

[jira] [Created] (NUTCH-2235) Classpath discrepancy with protocol-selenium in deploy mode

2016-02-25 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2235: --- Summary: Classpath discrepancy with protocol-selenium in deploy mode Key: NUTCH-2235 URL: https://issues.apache.org/jira/browse/NUTCH-2235 Project

[jira] [Commented] (NUTCH-1712) Use MultipleInputs in Injector to make it a single mapreduce job

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167963#comment-15167963 ] Lewis John McGibbney commented on NUTCH-1712: - Is the Nutch codebase now acting off of Git

[jira] [Comment Edited] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167309#comment-15167309 ] Lewis John McGibbney edited comment on NUTCH- at 2/25/16 3:21 PM

[jira] [Comment Edited] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167309#comment-15167309 ] Lewis John McGibbney edited comment on NUTCH- at 2/25/16 3:20 PM

[jira] [Commented] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167309#comment-15167309 ] Lewis John McGibbney commented on NUTCH-: - We need to step through crawl steps and find

[jira] [Updated] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-02-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-: Description: This problem happens at the the second time I crawl a page {code} bin

[jira] [Updated] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-02-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-: Summary: re-fetch deletes all metadata except _csh_ and _rs_ (was: fetch deletes

[jira] [Assigned] (NUTCH-2222) fetch deletes all metadata except _csh_ and _rs_

2016-02-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-: --- Assignee: Lewis John McGibbney > fetch deletes all metadata except _c

[jira] [Commented] (NUTCH-2218) Switch CrawlCompletion arg parsing to Commons CLI

2016-02-18 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152751#comment-15152751 ] Lewis John McGibbney commented on NUTCH-2218: - Nice Mike thanks > Switch CrawlCompletion

[jira] [Assigned] (NUTCH-2035) Regex filter using case sensitive rules.

2016-02-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2035: --- Assignee: Lewis John McGibbney > Regex filter using case sensitive ru

[jira] [Updated] (NUTCH-2033) parse-tika skips valid documents.

2016-02-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2033: Fix Version/s: 1.12 > parse-tika skips valid docume

[jira] [Assigned] (NUTCH-2033) parse-tika skips valid documents.

2016-02-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2033: --- Assignee: Lewis John McGibbney > parse-tika skips valid docume

[jira] [Updated] (NUTCH-2035) Regex filter using case sensitive rules.

2016-02-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2035: Fix Version/s: 1.12 > Regex filter using case sensitive ru

[jira] [Assigned] (NUTCH-2034) CrawlDB filtered documents counter.

2016-02-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2034: --- Assignee: Lewis John McGibbney > CrawlDB filtered documents coun

[jira] [Updated] (NUTCH-2032) Plugin to index the raw content of a readable document.

2016-02-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2032: Fix Version/s: 1.12 > Plugin to index the raw content of a readable docum

[jira] [Assigned] (NUTCH-2032) Plugin to index the raw content of a readable document.

2016-02-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2032: --- Assignee: Lewis John McGibbney > Plugin to index the raw cont

[jira] [Updated] (NUTCH-2034) CrawlDB filtered documents counter.

2016-02-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2034: Fix Version/s: 1.12 > CrawlDB filtered documents coun

[jira] [Updated] (NUTCH-2046) The crawl script should be able to skip an initial injection.

2016-02-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2046: Fix Version/s: 1.12 > The crawl script should be able to skip an initial inject

[jira] [Assigned] (NUTCH-2046) The crawl script should be able to skip an initial injection.

2016-02-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2046: --- Assignee: Lewis John McGibbney > The crawl script should be able to s

[jira] [Updated] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2016-02-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2144: Fix Version/s: 1.12 > Plugin to override db.ignore.external to exempt interest

[jira] [Commented] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2016-02-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141213#comment-15141213 ] Lewis John McGibbney commented on NUTCH-2144: - Hi [~thammegowda], limitations I see

[jira] [Updated] (NUTCH-2005) Implement HTrace'ing in Nutch

2016-02-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2005: Labels: gsoc2016 (was: ) > Implement HTrace'ing in Nu

[jira] [Commented] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2016-02-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141296#comment-15141296 ] Lewis John McGibbney commented on NUTCH-2144: - bq. [~chrismattmann] I am not sure if Tika can

[jira] [Assigned] (NUTCH-1314) Impose a limit on the length of outlink target urls

2016-02-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-1314: --- Assignee: Lewis John McGibbney > Impose a limit on the length of outl

[jira] [Commented] (NUTCH-1314) Impose a limit on the length of outlink target urls

2016-02-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137375#comment-15137375 ] Lewis John McGibbney commented on NUTCH-1314: - Committed @ revisions 1729218 and 1729219 in 2

Fwd: private Digest 5 Feb 2016 18:05:43 -0000 Issue 354

2016-02-05 Thread Lewis John Mcgibbney
h 1271) [REMINDER] ApacheCon NA 2016 Travel Assistance Applications now open! 1271 by: lewis john mcgibbney Administrivia: - To post to the list, e-mail: priv...@nutch.apache.org To unsubscribe, e-mail: private-digest-un

[jira] [Commented] (NUTCH-1314) Impose a limit on the length of outlink target urls

2016-02-02 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129575#comment-15129575 ] Lewis John McGibbney commented on NUTCH-1314: - Yep, if someone can consolidate the patches

[jira] [Updated] (NUTCH-1741) Support of Sitemaps in Nutch 2.x

2016-01-26 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1741: Attachment: NUTCH-1741v7.patch Managed to update this at the weekend and forgot

[jira] [Created] (NUTCH-2208) Fix 4 skipped tests in TestGenerator

2016-01-26 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2208: --- Summary: Fix 4 skipped tests in TestGenerator Key: NUTCH-2208 URL: https://issues.apache.org/jira/browse/NUTCH-2208 Project: Nutch Issue Type

[jira] [Updated] (NUTCH-2208) Fix 4 skipped tests in TestGenerator

2016-01-26 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2208: Attachment: TEST-org.apache.nutch.crawl.TestGenerator.txt Attached is full test log

[jira] [Resolved] (NUTCH-1741) Support of Sitemaps in Nutch 2.x

2016-01-26 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1741. - Resolution: Fixed Committed revision 1726853 in 2.X Thank you to everyone

[jira] [Commented] (NUTCH-2206) Provide example scoring.similarity.stopword.file

2016-01-26 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117800#comment-15117800 ] Lewis John McGibbney commented on NUTCH-2206: - We should most likely also provide the nutch

[jira] [Commented] (NUTCH-2206) Provide example scoring.similarity.stopword.file

2016-01-26 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118286#comment-15118286 ] Lewis John McGibbney commented on NUTCH-2206: - +1 [~sujenshah], thanks > Provide exam

Re: need suggestion for GSoC 2016

2016-01-26 Thread Lewis John Mcgibbney
Hi Ammar, I've given you write permissions for the wiki. Feel free to create a page for your proposed work at the URL below https://wiki.apache.org/nutch/GoogleSummerOfCode#A2016 On Fri, Jan 22, 2016 at 4:49 PM, Lewis John Mcgibbney < lewis.mcgibb...@gmail.com> wrote: > Hi Ammar,

[jira] [Updated] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2016-01-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2184: Attachment: NUTCH-2184v2.patch Updated patch for trunk. [~markus17], working

[jira] [Created] (NUTCH-2206) Provide example scoring.similarity.stopword.file

2016-01-25 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2206: --- Summary: Provide example scoring.similarity.stopword.file Key: NUTCH-2206 URL: https://issues.apache.org/jira/browse/NUTCH-2206 Project: Nutch

[jira] [Commented] (NUTCH-2206) Provide example scoring.similarity.stopword.file

2016-01-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116491#comment-15116491 ] Lewis John McGibbney commented on NUTCH-2206: - CC [~sujenshah] > Provide exam

[jira] [Created] (NUTCH-2207) Remove class duplication and smarten-up scoring-similarity plugin

2016-01-25 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2207: --- Summary: Remove class duplication and smarten-up scoring-similarity plugin Key: NUTCH-2207 URL: https://issues.apache.org/jira/browse/NUTCH-2207

[jira] [Commented] (NUTCH-2171) Upgrade Nutch Trunk to Java 1.8

2016-01-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113380#comment-15113380 ] Lewis John McGibbney commented on NUTCH-2171: - Hey [~jorgelbg] feel free to assign

Re: need suggestion for GSoC 2016

2016-01-22 Thread Lewis John Mcgibbney
pache.org/msg19783.html) and > doesn't have any reply so far. > I would appreciate use your suggestion. > > Warmest regards > Ammar Shadiq > > On Tue, Nov 3, 2015 at 3:28 AM, Lewis John Mcgibbney < > lewis.mcgibb...@gmail.com> wrote: > >> Hi Ammar, >

<    3   4   5   6   7   8   9   10   11   12   >