[jira] [Commented] (NUTCH-2274) InteractiveSelenium Plugin's DefaultHandler Returns Null

2016-06-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328012#comment-15328012 ] Lewis John McGibbney commented on NUTCH-2274: - Thanks for registering [~bmzhao] are you

[jira] [Assigned] (NUTCH-2273) Selenium and InteractiveSelenium Do Not Support HTTPS

2016-06-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2273: --- Assignee: Lewis John McGibbney > Selenium and InteractiveSelenium Do Not

[jira] [Assigned] (NUTCH-2274) InteractiveSelenium Plugin's DefaultHandler Returns Null

2016-06-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2274: --- Assignee: Lewis John McGibbney > InteractiveSelenium Plugin's DefaultHandler

[jira] [Commented] (NUTCH-2271) Solr indexer Failed

2016-06-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325106#comment-15325106 ] Lewis John McGibbney commented on NUTCH-2271: - Please take a look at the nutch-default.xml

[jira] [Resolved] (NUTCH-2271) Solr indexer Failed

2016-06-05 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2271. - Resolution: Not A Bug Nutch 1.12 supports Solr 5.4.1 not 6. Also Nutch 1.12 is

[jira] [Commented] (NUTCH-2271) Solr indexer Failed

2016-06-01 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310680#comment-15310680 ] Lewis John McGibbney commented on NUTCH-2271: - No. Please check build.xml what which version

[jira] [Updated] (NUTCH-2265) Write A Test Package for Scoring Similarity

2016-05-28 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2265: Fix Version/s: (was: 1.12) 1.13 > Write A Test Package for

[jira] [Commented] (NUTCH-2234) Upgrade to elasticsearch 2.1.1

2016-05-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298842#comment-15298842 ] Lewis John McGibbney commented on NUTCH-2234: - bq. I can update the patch or open a PR on

[jira] [Updated] (NUTCH-2089) Move Nutch 2.x to compile on JDK 8

2016-05-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2089: Fix Version/s: (was: 2.5) 2.4 > Move Nutch 2.x to compile on

[jira] [Updated] (NUTCH-2089) Move Nutch 2.x to compile on JDK 8

2016-05-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2089: Summary: Move Nutch 2.x to compile on JDK 8 (was: Move Nutch to compile on JDK 8)

[jira] [Resolved] (NUTCH-2266) Fix dead link in build.xml for javadoc

2016-05-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2266. - Resolution: Fixed Thanks [~kamaci] :) > Fix dead link in build.xml for javadoc >

[jira] [Updated] (NUTCH-2266) Fix dead link in build.xml for javadoc

2016-05-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2266: Fix Version/s: (was: 2.5) 2.4 > Fix dead link in build.xml

[jira] [Comment Edited] (NUTCH-2089) Move Nutch to compile on JDK 8

2016-05-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296981#comment-15296981 ] Lewis John McGibbney edited comment on NUTCH-2089 at 5/23/16 7:59 PM: --

[jira] [Commented] (NUTCH-2089) Move Nutch to compile on JDK 8

2016-05-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296981#comment-15296981 ] Lewis John McGibbney commented on NUTCH-2089: - What about javadoc? > Move Nutch to compile on

[jira] [Commented] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-05-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295673#comment-15295673 ] Lewis John McGibbney commented on NUTCH-: - Thanks Furkan. Do you have a unit test which

[jira] [Updated] (NUTCH-2263) Support for mingram and maxgram at Unigram Cosine Similarity Model

2016-05-19 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2263: Fix Version/s: (was: 2.4.1) 1.12 > Support for mingram and

[jira] [Resolved] (NUTCH-2263) Support for mingram and maxgram at Unigram Cosine Similarity Model

2016-05-19 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2263. - Resolution: Fixed Assignee: Furkan KAMACI Thank you [~kamaci] nice patch >

[jira] [Updated] (NUTCH-2122) Implement Javadoc package-info.html for webui packages

2016-05-19 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2122: Summary: Implement Javadoc package-info.html for webui packages (was: Implement

[jira] [Commented] (NUTCH-2122) Implement Javadoc package.html for webui packages

2016-05-19 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292454#comment-15292454 ] Lewis John McGibbney commented on NUTCH-2122: - I agree :) > Implement Javadoc package.html

[jira] [Commented] (NUTCH-1858) Migrate Nutch documentation from Moin Moin to Confluence

2016-05-19 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291595#comment-15291595 ] Lewis John McGibbney commented on NUTCH-1858: - AFAIK a script or two exist to do the

[jira] [Commented] (NUTCH-1858) Migrate Nutch documentation from Moin Moin to Confluence

2016-05-19 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291400#comment-15291400 ] Lewis John McGibbney commented on NUTCH-1858: - I honestly do no know. This is a huge amount of

[jira] [Commented] (NUTCH-2122) Implement Javadoc package.html for service packages

2016-05-19 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291395#comment-15291395 ] Lewis John McGibbney commented on NUTCH-2122: - Hi Furkan, please change the 'service' to

[jira] [Commented] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-05-17 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287572#comment-15287572 ] Lewis John McGibbney commented on NUTCH-: - [~kamaci] can you please check this issue out

[jira] [Updated] (NUTCH-2112) Missing org.restlet.jee when building with gora-solr

2016-05-17 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2112: Fix Version/s: 2.4 > Missing org.restlet.jee when building with gora-solr >

[jira] [Reopened] (NUTCH-2248) CSS parser plugin

2016-05-17 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reopened NUTCH-2248: - > CSS parser plugin > - > > Key: NUTCH-2248 >

[jira] [Updated] (NUTCH-2251) Make CommonCrawlFormatJackson instance reusable by properly handling object state

2016-05-17 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2251: Fix Version/s: (was: 1.10) 1.13 > Make

[jira] [Resolved] (NUTCH-2260) JAVA_HOME and hbase-common dependency absent from hbase Docker image

2016-05-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2260. - Resolution: Fixed > JAVA_HOME and hbase-common dependency absent from hbase

[jira] [Resolved] (NUTCH-2259) Nutch 2.x HBase Docker requires a logs folder to run exception free

2016-05-14 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2259. - Resolution: Fixed > Nutch 2.x HBase Docker requires a logs folder to run

[jira] [Commented] (NUTCH-2259) Nutch 2.x HBase Docker requires a logs folder to run exception free

2016-05-14 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283643#comment-15283643 ] Lewis John McGibbney commented on NUTCH-2259: - PR is available at

[jira] [Created] (NUTCH-2259) Nutch 2.x HBase Docker requires a logs folder to run exception free

2016-05-14 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2259: --- Summary: Nutch 2.x HBase Docker requires a logs folder to run exception free Key: NUTCH-2259 URL: https://issues.apache.org/jira/browse/NUTCH-2259

[jira] [Commented] (NUTCH-2258) Provide Javadoc for ScriptInput/OutputFormat's

2016-05-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281054#comment-15281054 ] Lewis John McGibbney commented on NUTCH-2258: - I accidentally created this issue for NUTCH

[jira] [Created] (NUTCH-2258) Provide Javadoc for ScriptInput/OutputFormat's

2016-05-11 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2258: --- Summary: Provide Javadoc for ScriptInput/OutputFormat's Key: NUTCH-2258 URL: https://issues.apache.org/jira/browse/NUTCH-2258 Project: Nutch

[jira] [Resolved] (NUTCH-2252) Allow phantomjs as a browser for selenium options

2016-05-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2252. - Resolution: Fixed Thanks [~kwhitehall] and folks. > Allow phantomjs as a

[jira] [Commented] (NUTCH-1824) protocol-http using proxy not working with https sites

2016-05-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276727#comment-15276727 ] Lewis John McGibbney commented on NUTCH-1824: - Hi [~xjtujiyong] I'll try and scope later and

[jira] [Updated] (NUTCH-2252) Allow phantomjs as a browser for selenium options

2016-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2252: Fix Version/s: 1.12 > Allow phantomjs as a browser for selenium options >

[jira] [Updated] (NUTCH-2252) Allow phantomjs as a browser for selenium options

2016-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2252: Affects Version/s: 1.12 > Allow phantomjs as a browser for selenium options >

[jira] [Assigned] (NUTCH-2252) Allow phantomjs as a browser for selenium options

2016-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2252: --- Assignee: Lewis John McGibbney > Allow phantomjs as a browser for selenium

[jira] [Updated] (NUTCH-2188) While crawling with solr url (kerberos enabled) Error: org.apache.solr.common.SolrException: Unauthorized

2016-04-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2188: Fix Version/s: (was: 1.9) 1.12 > While crawling with solr

[jira] [Updated] (NUTCH-2217) Crawl pages with specified language

2016-04-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2217: Fix Version/s: (was: 1.11) 2.5 > Crawl pages with specified

[jira] [Resolved] (NUTCH-2238) Indexer for Elasticsearch 2.x

2016-04-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2238. - Resolution: Fixed Thank you [~ptorrestr] > Indexer for Elasticsearch 2.x >

[jira] [Updated] (NUTCH-1741) Support of Sitemaps in Nutch 2.x

2016-04-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1741: Fix Version/s: (was: 2.3.2) 2.4 > Support of Sitemaps in

[jira] [Updated] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-04-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-: Fix Version/s: (was: 2.3.2) 2.4 > re-fetch deletes all

[jira] [Updated] (NUTCH-2238) Indexer for Elasticsearch 2.x

2016-04-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2238: Assignee: Pablo Torres > Indexer for Elasticsearch 2.x >

[jira] [Updated] (NUTCH-2238) Indexer for Elasticsearch 2.x

2016-04-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2238: Fix Version/s: (was: 2.3.2) > Indexer for Elasticsearch 2.x >

[jira] [Commented] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-04-07 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230393#comment-15230393 ] Lewis John McGibbney commented on NUTCH-: - I've committed and closed the MemStore

[jira] [Commented] (NUTCH-2191) Add protocol-htmlunit

2016-03-27 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213527#comment-15213527 ] Lewis John McGibbney commented on NUTCH-2191: - Thanks [~karanjeets] good job > Add

[jira] [Commented] (NUTCH-2191) Add protocol-htmlunit

2016-03-27 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213390#comment-15213390 ] Lewis John McGibbney commented on NUTCH-2191: - [~karanjeets] if you can please step through

[jira] [Commented] (NUTCH-2089) Move Nutch to compile on JDK 8

2016-03-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212656#comment-15212656 ] Lewis John McGibbney commented on NUTCH-2089: - Compiler warnings, etc. Feel free to analyze

[jira] [Commented] (NUTCH-2005) Implement HTrace'ing in Nutch

2016-03-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210241#comment-15210241 ] Lewis John McGibbney commented on NUTCH-2005: - [~mefaraz...@gmail.com], please check out the

[jira] [Updated] (NUTCH-2005) Implement HTrace'ing in Nutch

2016-03-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2005: Description: Recent developments within the tracing community have brought projects

[jira] [Updated] (NUTCH-2005) Implement HTrace'ing in Nutch

2016-03-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2005: Fix Version/s: 2.4 > Implement HTrace'ing in Nutch > -

[jira] [Updated] (NUTCH-2005) Implement HTrace'ing in Nutch

2016-03-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2005: Description: Recent developments within the tracing community have brought projects

[jira] [Updated] (NUTCH-1756) Security layer for NutchServer

2016-03-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1756: Description: It will be beneficial to have a security layer for NutchServer once we

[jira] [Updated] (NUTCH-1756) Security layer for NutchServer

2016-03-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1756: Labels: gsoc2016 (was: ) > Security layer for NutchServer >

[jira] [Commented] (NUTCH-2005) Implement HTrace'ing in Nutch

2016-03-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209411#comment-15209411 ] Lewis John McGibbney commented on NUTCH-2005: - [~mefaraz...@gmail.com] are you still

[jira] [Commented] (NUTCH-1492) Support gora-dynamodb in Nutch 2.x

2016-03-19 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198869#comment-15198869 ] Lewis John McGibbney commented on NUTCH-1492: - [~renato2099] what about this shit? > Support

[jira] [Updated] (NUTCH-2185) protocol-soda-consumer plugin

2016-03-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2185: Description: I'm finishing off a Nutch protocol implementation for interacting with

[jira] [Commented] (NUTCH-2202) Integration of Anthelion (Focused Crawling Module) into Nutch

2016-03-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15187238#comment-15187238 ] Lewis John McGibbney commented on NUTCH-2202: - [~robertmeusel] please don't look into it yet.

[jira] [Assigned] (NUTCH-2202) Integration of Anthelion (Focused Crawling Module) into Nutch

2016-03-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2202: --- Assignee: Lewis John McGibbney > Integration of Anthelion (Focused Crawling

[jira] [Updated] (NUTCH-2202) Integration of Anthelion (Focused Crawling Module) into Nutch

2016-03-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2202: Description: We have recently released anthelion, which is a focused crawler plugin

[jira] [Commented] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-03-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178568#comment-15178568 ] Lewis John McGibbney commented on NUTCH-: - Nice [~abenjell], In Nutch we use

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2016-03-02 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176122#comment-15176122 ] Lewis John McGibbney commented on NUTCH-2184: - sh*t, I didn't push up my assertions. I'll get

[jira] [Comment Edited] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-03-02 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176048#comment-15176048 ] Lewis John McGibbney edited comment on NUTCH- at 3/2/16 5:34 PM: -

[jira] [Commented] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-03-02 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176048#comment-15176048 ] Lewis John McGibbney commented on NUTCH-: - I don't think there is any workaround no. The

[jira] [Commented] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-02-28 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171487#comment-15171487 ] Lewis John McGibbney commented on NUTCH-: - Hi, I can replicate this on

[jira] [Updated] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-02-28 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-: Fix Version/s: 2.3.2 > re-fetch deletes all metadata except _csh_ and _rs_ >

[jira] [Updated] (NUTCH-1741) Support of Sitemaps in Nutch 2.x

2016-02-28 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1741: Fix Version/s: (was: 2.4) 2.3.2 > Support of Sitemaps in

[jira] [Commented] (NUTCH-2234) Upgrade to elasticsearch 2.1.1

2016-02-26 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169521#comment-15169521 ] Lewis John McGibbney commented on NUTCH-2234: - Out or curiosity. What versions of httpcore and

[jira] [Commented] (NUTCH-2235) Classpath discrepancy with protocol-selenium in deploy mode

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168253#comment-15168253 ] Lewis John McGibbney commented on NUTCH-2235: - The source of this issue is ordering of Nutch

[jira] [Comment Edited] (NUTCH-2235) Classpath discrepancy with protocol-selenium in deploy mode

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168253#comment-15168253 ] Lewis John McGibbney edited comment on NUTCH-2235 at 2/26/16 1:38 AM: --

[jira] [Commented] (NUTCH-2235) Classpath discrepancy with protocol-selenium in deploy mode

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168156#comment-15168156 ] Lewis John McGibbney commented on NUTCH-2235: - Looks like the issue is with httpcore instead

[jira] [Commented] (NUTCH-2235) Classpath discrepancy with protocol-selenium in deploy mode

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168097#comment-15168097 ] Lewis John McGibbney commented on NUTCH-2235: - {code} jar tf apache-nutch-1.12-SNAPSHOT.job |

[jira] [Commented] (NUTCH-2235) Classpath discrepancy with protocol-selenium in deploy mode

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168095#comment-15168095 ] Lewis John McGibbney commented on NUTCH-2235: - This issue is commonly associated with

[jira] [Created] (NUTCH-2235) Classpath discrepancy with protocol-selenium in deploy mode

2016-02-25 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2235: --- Summary: Classpath discrepancy with protocol-selenium in deploy mode Key: NUTCH-2235 URL: https://issues.apache.org/jira/browse/NUTCH-2235 Project:

[jira] [Commented] (NUTCH-1712) Use MultipleInputs in Injector to make it a single mapreduce job

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167963#comment-15167963 ] Lewis John McGibbney commented on NUTCH-1712: - Is the Nutch codebase now acting off of Git? If

[jira] [Comment Edited] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167309#comment-15167309 ] Lewis John McGibbney edited comment on NUTCH- at 2/25/16 3:21 PM: --

[jira] [Comment Edited] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167309#comment-15167309 ] Lewis John McGibbney edited comment on NUTCH- at 2/25/16 3:20 PM: --

[jira] [Commented] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167309#comment-15167309 ] Lewis John McGibbney commented on NUTCH-: - We need to step through crawl steps and find

[jira] [Updated] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-02-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-: Description: This problem happens at the the second time I crawl a page {code}

[jira] [Updated] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-02-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-: Summary: re-fetch deletes all metadata except _csh_ and _rs_ (was: fetch deletes

[jira] [Assigned] (NUTCH-2222) fetch deletes all metadata except _csh_ and _rs_

2016-02-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-: --- Assignee: Lewis John McGibbney > fetch deletes all metadata except _csh_

[jira] [Commented] (NUTCH-2218) Switch CrawlCompletion arg parsing to Commons CLI

2016-02-18 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152751#comment-15152751 ] Lewis John McGibbney commented on NUTCH-2218: - Nice Mike thanks > Switch CrawlCompletion arg

[jira] [Assigned] (NUTCH-2035) Regex filter using case sensitive rules.

2016-02-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2035: --- Assignee: Lewis John McGibbney > Regex filter using case sensitive rules. >

[jira] [Updated] (NUTCH-2033) parse-tika skips valid documents.

2016-02-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2033: Fix Version/s: 1.12 > parse-tika skips valid documents. >

[jira] [Assigned] (NUTCH-2033) parse-tika skips valid documents.

2016-02-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2033: --- Assignee: Lewis John McGibbney > parse-tika skips valid documents. >

[jira] [Updated] (NUTCH-2035) Regex filter using case sensitive rules.

2016-02-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2035: Fix Version/s: 1.12 > Regex filter using case sensitive rules. >

[jira] [Assigned] (NUTCH-2034) CrawlDB filtered documents counter.

2016-02-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2034: --- Assignee: Lewis John McGibbney > CrawlDB filtered documents counter. >

[jira] [Updated] (NUTCH-2032) Plugin to index the raw content of a readable document.

2016-02-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2032: Fix Version/s: 1.12 > Plugin to index the raw content of a readable document. >

[jira] [Assigned] (NUTCH-2032) Plugin to index the raw content of a readable document.

2016-02-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2032: --- Assignee: Lewis John McGibbney > Plugin to index the raw content of a

[jira] [Updated] (NUTCH-2034) CrawlDB filtered documents counter.

2016-02-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2034: Fix Version/s: 1.12 > CrawlDB filtered documents counter. >

[jira] [Updated] (NUTCH-2046) The crawl script should be able to skip an initial injection.

2016-02-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2046: Fix Version/s: 1.12 > The crawl script should be able to skip an initial injection.

[jira] [Assigned] (NUTCH-2046) The crawl script should be able to skip an initial injection.

2016-02-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2046: --- Assignee: Lewis John McGibbney > The crawl script should be able to skip an

[jira] [Updated] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2016-02-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2144: Fix Version/s: 1.12 > Plugin to override db.ignore.external to exempt interesting

[jira] [Commented] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2016-02-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141213#comment-15141213 ] Lewis John McGibbney commented on NUTCH-2144: - Hi [~thammegowda], limitations I see are as

[jira] [Updated] (NUTCH-2005) Implement HTrace'ing in Nutch

2016-02-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2005: Labels: gsoc2016 (was: ) > Implement HTrace'ing in Nutch >

[jira] [Commented] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2016-02-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141296#comment-15141296 ] Lewis John McGibbney commented on NUTCH-2144: - bq. [~chrismattmann] I am not sure if Tika can

[jira] [Assigned] (NUTCH-1314) Impose a limit on the length of outlink target urls

2016-02-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-1314: --- Assignee: Lewis John McGibbney > Impose a limit on the length of outlink

[jira] [Commented] (NUTCH-1314) Impose a limit on the length of outlink target urls

2016-02-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137375#comment-15137375 ] Lewis John McGibbney commented on NUTCH-1314: - Committed @ revisions 1729218 and 1729219 in

[jira] [Commented] (NUTCH-1314) Impose a limit on the length of outlink target urls

2016-02-02 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129575#comment-15129575 ] Lewis John McGibbney commented on NUTCH-1314: - Yep, if someone can consolidate the patches

[jira] [Updated] (NUTCH-1741) Support of Sitemaps in Nutch 2.x

2016-01-26 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1741: Attachment: NUTCH-1741v7.patch Managed to update this at the weekend and forgot to

<    2   3   4   5   6   7   8   9   10   11   >