[jira] [Updated] (NUTCH-2492) Add more configuration parameters to crawl script

2018-01-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2492: Fix Version/s: 1.15 > Add more configuration parameters to crawl scr

[jira] [Updated] (NUTCH-2490) Sitemap processing: Sitemap index files not working

2018-01-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2490: Fix Version/s: 1.15 > Sitemap processing: Sitemap index files not work

[jira] [Resolved] (NUTCH-2490) Sitemap processing: Sitemap index files not working

2018-01-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2490. - Resolution: Fixed Thank you [~mfeltscher] > Sitemap processing: Sitemap in

[jira] [Resolved] (NUTCH-2491) Integrate sitemap processing and HostDB into crawl script

2018-01-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2491. - Resolution: Fixed Thank you [~mfeltscher] > Integrate sitemap process

[jira] [Updated] (NUTCH-2491) Integrate sitemap processing and HostDB into crawl script

2018-01-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2491: Fix Version/s: 1.15 > Integrate sitemap processing and HostDB into crawl scr

[jira] [Resolved] (NUTCH-2454) REST API fix for usage of hostdb in generator

2018-01-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2454. - Resolution: Fixed Thank you [~semyon.semyo...@mail.com] > REST API fix for us

[jira] [Resolved] (NUTCH-2486) Compiler Warning: Unchecked / unsafe operations in MimeTypeIndexingFilter

2017-12-19 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2486. - Resolution: Fixed > Compiler Warning: Unchecked / unsafe operati

[jira] [Updated] (NUTCH-2486) Compiler Warning: Unchecked / unsafe operations in MimeTypeIndexingFilter

2017-12-19 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2486: Fix Version/s: 1.14 > Compiler Warning: Unchecked / unsafe operati

[jira] [Resolved] (NUTCH-2358) HostInjectorJob doesn't work

2017-12-17 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2358. - Resolution: Fixed > HostInjectorJob doesn't w

[jira] [Commented] (NUTCH-2358) HostInjectorJob doesn't work

2017-12-17 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294236#comment-16294236 ] Lewis John McGibbney commented on NUTCH-2358: - Thank you [~cloudysunny14] patch applied

[jira] [Updated] (NUTCH-2358) HostInjectorJob doesn't work

2017-12-17 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2358: Fix Version/s: 2.4 > HostInjectorJob doesn't w

[jira] [Resolved] (NUTCH-2484) Extend indexer-elastic-rest to support languages

2017-12-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2484. - Resolution: Fixed > Extend indexer-elastic-rest to support langua

[jira] [Created] (NUTCH-2484) Extend indexer-elastic-rest to support languages

2017-12-16 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2484: --- Summary: Extend indexer-elastic-rest to support languages Key: NUTCH-2484 URL: https://issues.apache.org/jira/browse/NUTCH-2484 Project: Nutch

[jira] [Commented] (NUTCH-2157) Parent Issue for Addressing Miredot REST API Warnings

2017-12-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292830#comment-16292830 ] Lewis John McGibbney commented on NUTCH-2157: - There are still many warnings. http

[jira] [Updated] (NUTCH-2157) Parent Issue for Addressing Miredot REST API Warnings

2017-12-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2157: Fix Version/s: (was: 1.14) 1.15 > Parent Is

[jira] [Resolved] (NUTCH-2181) Add Webpage for 3rd Party Connectors/Libraries to Apache Nutch

2017-12-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2181. - Resolution: Won't Fix Fix Version/s: 1.14 These are never kept up-to-date

[jira] [Updated] (NUTCH-2185) protocol-soda-consumer plugin

2017-12-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2185: Fix Version/s: (was: 1.15) 1.14 > protocol-soda-consu

[jira] [Resolved] (NUTCH-2185) protocol-soda-consumer plugin

2017-12-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2185. - Resolution: Won't Fix This was a very limited use case and is not worth

[jira] [Resolved] (NUTCH-2473) Elasticsearch REST Indexer broken due to wrong depenency

2017-12-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2473. - Resolution: Fixed > Elasticsearch REST Indexer broken due to wrong depene

[jira] [Updated] (NUTCH-2473) Elasticsearch REST Indexer broken due to wrong depenency

2017-12-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2473: Fix Version/s: 1.14 > Elasticsearch REST Indexer broken due to wrong depene

[jira] [Assigned] (NUTCH-2414) Allow LanguageIndexingFilter to actually filter documents by language.

2017-12-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2414: --- Assignee: Lewis John McGibbney > Allow LanguageIndexingFilter to actua

[jira] [Updated] (NUTCH-2414) Allow LanguageIndexingFilter to actually filter documents by language.

2017-12-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2414: Fix Version/s: 1.14 > Allow LanguageIndexingFilter to actually filter docume

[jira] [Resolved] (NUTCH-2414) Allow LanguageIndexingFilter to actually filter documents by language.

2017-12-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2414. - Resolution: Fixed > Allow LanguageIndexingFilter to actually filter docume

[jira] [Resolved] (NUTCH-2438) Upgrade Nutch 2.X to Gora 0.8

2017-12-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2438. - Resolution: Fixed > Upgrade Nutch 2.X to Gora

[jira] [Assigned] (NUTCH-2438) Upgrade Nutch 2.X to Gora 0.8

2017-12-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2438: --- Assignee: Lewis John McGibbney > Upgrade Nutch 2.X to Gora

[jira] [Resolved] (NUTCH-2437) gora mongodb mapping file error

2017-10-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2437. - Resolution: Fixed > gora mongodb mapping file er

[jira] [Updated] (NUTCH-2374) Upgrade Nutch 2.X to Gora 0.7

2017-10-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2374: Issue Type: Improvement (was: Bug) > Upgrade Nutch 2.X to Gora

[jira] [Resolved] (NUTCH-2374) Upgrade Nutch 2.X to Gora 0.7

2017-10-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2374. - Resolution: Fixed > Upgrade Nutch 2.X to Gora

[jira] [Resolved] (NUTCH-2436) Remove empty comment, and redundant semicolon from CommandRunner

2017-09-28 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2436. - Resolution: Fixed Thank you [~kpm1985] > Remove empty comment, and redund

[jira] [Updated] (NUTCH-2436) Remove empty comment, and redundant semicolon from CommandRunner

2017-09-28 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2436: Fix Version/s: 1.14 > Remove empty comment, and redundant semicolon f

[jira] [Resolved] (NUTCH-2235) Classpath discrepancy with protocol-selenium in deploy mode

2017-09-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2235. - Resolution: Fixed NUTCH-2378 > Classpath discrepancy with protocol-selen

Request for Review

2017-09-06 Thread lewis john mcgibbney
Hi user@ and dev@, As part of the Nutch Google Summer of Code effort this year, Omkar Reddy and I have been working persistently throughout the summer months on the Hadoop MapReduce API upgrade e.g. NUTCH-2375 Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce [0].

[jira] [Resolved] (NUTCH-2399) indexer-elastic does not index multi-value fields (only the first value is indexed)

2017-08-21 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2399. - Resolution: Fixed > indexer-elastic does not index multi-value fields (o

[jira] [Updated] (NUTCH-2399) indexer-elastic does not index multi-value fields (only the first value is indexed)

2017-08-21 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2399: Fix Version/s: 1.14 > indexer-elastic does not index multi-value fields (o

[jira] [Resolved] (NUTCH-2400) Solr 6.6.0 compatibility

2017-08-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2400. - Resolution: Fixed > Solr 6.6.0 compatibil

[jira] [Resolved] (NUTCH-2405) jsoup-extractor structure correction, typo fixed

2017-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2405. - Resolution: Fixed Thank you [~kaidul] > jsoup-extractor structure correct

[jira] [Updated] (NUTCH-2406) Sum up constants, make minor changes

2017-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2406: Fix Version/s: 1.14 > Sum up constants, make minor chan

[jira] [Assigned] (NUTCH-2406) Sum up constants, make minor changes

2017-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2406: --- Assignee: kenneth mcfarland > Sum up constants, make minor chan

[jira] [Resolved] (NUTCH-2406) Sum up constants, make minor changes

2017-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2406. - Resolution: Fixed > Sum up constants, make minor chan

[jira] [Resolved] (NUTCH-2404) Failed Jenkin Build #1588 error in unit test resolved

2017-07-31 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2404. - Resolution: Fixed Thank you [~kaidul] > Failed Jenkin Build #1588 error in u

[jira] [Resolved] (NUTCH-2389) Precise data parsing using Jsoup CSS selectors

2017-07-30 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2389. - Resolution: Fixed Thank you [~kaidul] > Precise data parsing using Jsoup

[jira] [Commented] (NUTCH-1129) Any23 Nutch plugin

2017-07-27 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102898#comment-16102898 ] Lewis John McGibbney commented on NUTCH-1129: - We need some sort of reasonable response here

[jira] [Assigned] (NUTCH-2403) Nutch Selenium: Wrong documentation about PhantomJS

2017-07-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2403: --- Assignee: Moreno Feltscher > Nutch Selenium: Wrong documentation ab

[jira] [Updated] (NUTCH-2403) Nutch Selenium: Wrong documentation about PhantomJS

2017-07-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2403: Affects Version/s: 1.13 > Nutch Selenium: Wrong documentation about Phanto

[jira] [Updated] (NUTCH-2403) Nutch Selenium: Wrong documentation about PhantomJS

2017-07-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2403: Fix Version/s: 1.14 > Nutch Selenium: Wrong documentation about Phanto

[jira] [Resolved] (NUTCH-2403) Nutch Selenium: Wrong documentation about PhantomJS

2017-07-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2403. - Resolution: Fixed > Nutch Selenium: Wrong documentation about Phanto

[jira] [Updated] (NUTCH-2403) Nutch Selenium: Wrong documentation about PhantomJS

2017-07-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2403: Component/s: plugin documentation > Nutch Selenium: Wr

[jira] [Updated] (NUTCH-2400) Solr 6.6.0 compatibility

2017-07-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2400: Attachment: managed-schema This is the managed-schema generated from schema.xml

[jira] [Created] (NUTCH-2400) Solr 6.6.0 compatibility

2017-07-12 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2400: --- Summary: Solr 6.6.0 compatibility Key: NUTCH-2400 URL: https://issues.apache.org/jira/browse/NUTCH-2400 Project: Nutch Issue Type: Improvement

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2017-07-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074046#comment-16074046 ] Lewis John McGibbney commented on NUTCH-1465: - [~markus17] can we also update the version

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2017-07-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16072959#comment-16072959 ] Lewis John McGibbney commented on NUTCH-1465: - [~markus17] when attempting to process

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2017-06-30 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070468#comment-16070468 ] Lewis John McGibbney commented on NUTCH-1465: - Fantastic [~markus17] is this working well

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2017-06-29 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16068597#comment-16068597 ] Lewis John McGibbney commented on NUTCH-2184: - Hi [~markus17] I need to finish the bloody MR

Establishment of Static Source Code Analysis

2017-06-15 Thread lewis john mcgibbney
Hi Folks, I don't know if anyone else noticed... some of our Russian compatriots have set up a static auto bot to notify us of source code issues... An example is as follows https://issues.apache.org/jira/browse/NUTCH-2394 I think this is great to be honest... with some peer review I think we

[ANNOUNCEMENT] Welcome Blackice as new Nutch PMC and Committer

2017-06-14 Thread lewis john mcgibbney
Hi Folks, The Nutch PMC recently VOTE'd in Blackice to formally join our Nutch Project Management Committee and as a Project Committer. Please join me in offering a friendly welcome... not that he needs it He's been here for quite a while :) @Blackice, feel free to say a bit about yourself if you

[jira] [Commented] (NUTCH-2389) Precise data parsing using Jsoup CSS selectors

2017-06-01 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034162#comment-16034162 ] Lewis John McGibbney commented on NUTCH-2389: - [~kaidul], i think that the plugin should

[jira] [Resolved] (NUTCH-2388) bin/crawl indexing only webpages containing batchID instead of all in 2.x

2017-05-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2388. - Resolution: Fixed > bin/crawl indexing only webpages containing batchID inst

[jira] [Commented] (NUTCH-2382) indexer-hbase Nutch 1.x branch

2017-05-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020207#comment-16020207 ] Lewis John McGibbney commented on NUTCH-2382: - I am +1 for committing this to master. I've

[jira] [Resolved] (NUTCH-2373) Indexer for Hbase

2017-05-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2373. - Resolution: Fixed > Indexer for Hbase > - > >

[jira] [Resolved] (NUTCH-2353) Create seed file with metadata using the REST API

2017-05-19 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2353. - Resolution: Fixed Nice work [~jorgelbg] > Create seed file with metadata us

Fwd: GSoC 2017: You are a mentor for Omkar Reddy Gojala

2017-05-08 Thread lewis john mcgibbney
Great Omkar... email below is conformation of your acceptance into GSoC again this year. Looking forward to this project. Best Lewis -- Forwarded message -- From: Google Summer of Code Date: Thu, May 4, 2017 at 9:06 AM Subject: GSoC 2017: You are

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2017-04-21 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979078#comment-15979078 ] Lewis John McGibbney commented on NUTCH-1465: - I'm going to take this on. We want full sitemap

[jira] [Assigned] (NUTCH-1465) Support sitemaps in Nutch

2017-04-21 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-1465: --- Assignee: Lewis John McGibbney (was: Tejas Patil) > Support sitem

[jira] [Updated] (NUTCH-2370) Saving mapping of dumped file to URL

2017-04-20 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2370: Component/s: dumpers > Saving mapping of dumped file to

[jira] [Updated] (NUTCH-2370) Saving mapping of dumped file to URL

2017-04-20 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2370: Affects Version/s: 1.14 > Saving mapping of dumped file to

[jira] [Updated] (NUTCH-2370) Saving mapping of dumped file to URL

2017-04-20 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2370: Fix Version/s: 1.14 > Saving mapping of dumped file to

[jira] [Updated] (NUTCH-2374) Upgrade Nutch 2.X to Gora 0.7

2017-04-16 Thread lewis john mcgibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lewis john mcgibbney updated NUTCH-2374: Hi, do you have a patch and can you supply a pull request. I can try to fix the test

[jira] [Commented] (NUTCH-2373) Indexer for Hbase

2017-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970464#comment-15970464 ] Lewis John McGibbney commented on NUTCH-2373: - As I said before... there is already

[jira] [Assigned] (NUTCH-2373) Indexer for Hbase

2017-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2373: --- Assignee: Kaidul Islam > Indexer for Hb

[jira] [Commented] (NUTCH-2373) Indexer for Hbase

2017-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970450#comment-15970450 ] Lewis John McGibbney commented on NUTCH-2373: - OK sorry I see where your coming from now. Yes

[jira] [Created] (NUTCH-2374) Upgrade Nutch 2.X to Gora 0.7

2017-04-16 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2374: --- Summary: Upgrade Nutch 2.X to Gora 0.7 Key: NUTCH-2374 URL: https://issues.apache.org/jira/browse/NUTCH-2374 Project: Nutch Issue Type: Bug

[jira] [Commented] (NUTCH-2373) Indexer for Hbase

2017-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970441#comment-15970441 ] Lewis John McGibbney commented on NUTCH-2373: - Hi [~kaidul] Nutch 2.X already permits

[jira] [Resolved] (NUTCH-2333) Indexer for RabbitMQ

2017-04-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2333. - Resolution: Fixed > Indexer for Rabbi

[jira] [Updated] (NUTCH-2372) Javadocs build failing.

2017-04-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2372: Fix Version/s: (was: 2.4) > Javadocs build fail

[jira] [Resolved] (NUTCH-2372) Javadocs build failing.

2017-04-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2372. - Resolution: Fixed > Javadocs build fail

[jira] [Commented] (NUTCH-2292) Mavenize the build for nutch-core and nutch-plugins

2017-04-07 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961009#comment-15961009 ] Lewis John McGibbney commented on NUTCH-2292: - [~bvachon] bq. Can this work be done in the 2

[jira] [Assigned] (NUTCH-2296) Elasticsearch Indexing Over Rest

2017-04-05 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2296: --- Assignee: Brian Zhao > Elasticsearch Indexing Over R

[jira] [Resolved] (NUTCH-2296) Elasticsearch Indexing Over Rest

2017-04-05 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2296. - Resolution: Fixed Thank you [~bmzhao] > Elasticsearch Indexing Over R

[jira] [Updated] (NUTCH-2296) Elasticsearch Indexing Over Rest

2017-04-05 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2296: Fix Version/s: 1.14 > Elasticsearch Indexing Over R

[ANNOUNCE] Apache Nutch 1.13 Release

2017-04-02 Thread lewis john mcgibbney
Hello Folks, The Apache Nutch [0] Project Management Committee are pleased to announce the immediate release of Apache Nutch v1.13, we advise all current users and developers of the 1.X series to upgrade to this release. Nutch is a well matured, production ready Web crawler. Nutch 1.x enables

[RESULT] WAS Re: [VOTE] Release Apache Nutch 1.13 RC#1

2017-04-02 Thread lewis john mcgibbney
Hi Folks, Thank you to everyone who was able to review the RC and VOTE, greatly appreciated. 72 has come and gone, please see below for RESULT's. [9] +1 Release this package as Apache Nutch 1.13. Lewis John McGibbney * Julien Nioche * Kevin Ratnasekera Chris A. Mattmann * Furkan KAMACI * Matei

[VOTE] Release Apache Nutch 1.13 RC#1

2017-03-28 Thread lewis john mcgibbney
Hi Folks, A first candidate for the Nutch 1.13 release is available at: https://dist.apache.org/repos/dist/dev/nutch/1.13/ The release candidate is a zip and tar.gz archive of the binary and sources in: https://github.com/apache/nutch/tree/release-1.13 The SHA1 checksum of the archive is

[jira] [Commented] (NUTCH-2369) Create a new GraphGenerator Tool for writing Nutch Records as a Full Web Graph

2017-03-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929249#comment-15929249 ] Lewis John McGibbney commented on NUTCH-2369: - [~omkar20895] would you rather work on branch 2

[jira] [Commented] (NUTCH-2366) Deprecated Job constructor in hostdb/ReadHostDb.java

2017-03-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929248#comment-15929248 ] Lewis John McGibbney commented on NUTCH-2366: - Is this your first patch to Nutch [~omkar20895

[jira] [Created] (NUTCH-2369) Create a new GraphGenerator Tool for writing Nutch Records as Full Web Graphs

2017-03-15 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2369: --- Summary: Create a new GraphGenerator Tool for writing Nutch Records as Full Web Graphs Key: NUTCH-2369 URL: https://issues.apache.org/jira/browse/NUTCH-2369

[jira] [Updated] (NUTCH-2369) Create a new GraphGenerator Tool for writing Nutch Records as a Full Web Graph

2017-03-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2369: Summary: Create a new GraphGenerator Tool for writing Nutch Records as a Full Web

[DISCUSS] Release Nutch 1.X and 2.X

2017-03-10 Thread lewis john mcgibbney
Hi dev@, A lot of dev work has gone into both branches... I am keen to have a fresh release for us. I am suggesting, for open discussion, a joint release of 1.X and 2.X. 2.X continues to see interest so I think that this is a valid promotion for the Nutch project in general. What do you all think?

[jira] [Commented] (NUTCH-2292) Mavenize the build for nutch-core and nutch-plugins

2017-03-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902553#comment-15902553 ] Lewis John McGibbney commented on NUTCH-2292: - Another benefit of pf4j

[jira] [Comment Edited] (NUTCH-2292) Mavenize the build for nutch-core and nutch-plugins

2017-03-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902541#comment-15902541 ] Lewis John McGibbney edited comment on NUTCH-2292 at 3/9/17 6:03 AM

[jira] [Commented] (NUTCH-2292) Mavenize the build for nutch-core and nutch-plugins

2017-03-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902541#comment-15902541 ] Lewis John McGibbney commented on NUTCH-2292: - OK, thanks for quick feedback, I'll scope

[jira] [Commented] (NUTCH-2292) Mavenize the build for nutch-core and nutch-plugins

2017-03-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902527#comment-15902527 ] Lewis John McGibbney commented on NUTCH-2292: - [~thammegowda] I just pushed to the remote

[jira] [Commented] (NUTCH-2292) Mavenize the build for nutch-core and nutch-plugins

2017-03-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902095#comment-15902095 ] Lewis John McGibbney commented on NUTCH-2292: - [~thammegowda] the outlink extract or issue

[jira] [Commented] (NUTCH-2292) Mavenize the build for nutch-core and nutch-plugins

2017-03-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901903#comment-15901903 ] Lewis John McGibbney commented on NUTCH-2292: - [~thammegowda] do you see the NPE

[jira] [Comment Edited] (NUTCH-2292) Mavenize the build for nutch-core and nutch-plugins

2017-03-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900885#comment-15900885 ] Lewis John McGibbney edited comment on NUTCH-2292 at 3/8/17 8:14 AM

[jira] [Updated] (NUTCH-2292) Mavenize the build for nutch-core and nutch-plugins

2017-03-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2292: Fix Version/s: 1.13 > Mavenize the build for nutch-core and nutch-plug

[jira] [Assigned] (NUTCH-2292) Mavenize the build for nutch-core and nutch-plugins

2017-03-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2292: --- Assignee: Thamme Gowda > Mavenize the build for nutch-core and nutch-plug

[jira] [Commented] (NUTCH-2292) Mavenize the build for nutch-core and nutch-plugins

2017-03-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900885#comment-15900885 ] Lewis John McGibbney commented on NUTCH-2292: - [~thammegowda] I've fixed all tests in nutch

[jira] [Commented] (NUTCH-2292) Mavenize the build for nutch-core and nutch-plugins

2017-03-07 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900814#comment-15900814 ] Lewis John McGibbney commented on NUTCH-2292: - Hi [~markus17] and [~thammegowda] I've just

[jira] [Created] (NUTCH-2362) Upgrade MaxMind GeoIP version in index-geoip

2017-02-26 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2362: --- Summary: Upgrade MaxMind GeoIP version in index-geoip Key: NUTCH-2362 URL: https://issues.apache.org/jira/browse/NUTCH-2362 Project: Nutch

Fwd: Google Summer of Code 2017 is coming

2017-02-03 Thread lewis john mcgibbney
Hi Folks, Please see above. If anyone is interested in participating in or mentoring a GSoC project then please respond to this thread. Usually, from there you can open a Jira ticket in which ever project it is you are interested and we take it from there. Have a great weekend. Lewis --

<    1   2   3   4   5   6   7   8   9   10   >