[jira] [Commented] (NUTCH-2235) Classpath discrepancy with protocol-selenium in deploy mode

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168253#comment-15168253 ] Lewis John McGibbney commented on NUTCH-2235: - The source of this issue is ordering of Nutch

[jira] [Comment Edited] (NUTCH-2235) Classpath discrepancy with protocol-selenium in deploy mode

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168253#comment-15168253 ] Lewis John McGibbney edited comment on NUTCH-2235 at 2/26/16 1:38 AM: --

[jira] [Commented] (NUTCH-2235) Classpath discrepancy with protocol-selenium in deploy mode

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168156#comment-15168156 ] Lewis John McGibbney commented on NUTCH-2235: - Looks like the issue is with httpcore instead

[jira] [Commented] (NUTCH-2235) Classpath discrepancy with protocol-selenium in deploy mode

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168097#comment-15168097 ] Lewis John McGibbney commented on NUTCH-2235: - {code} jar tf apache-nutch-1.12-SNAPSHOT.job |

[jira] [Commented] (NUTCH-2235) Classpath discrepancy with protocol-selenium in deploy mode

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168095#comment-15168095 ] Lewis John McGibbney commented on NUTCH-2235: - This issue is commonly associated with

[jira] [Created] (NUTCH-2235) Classpath discrepancy with protocol-selenium in deploy mode

2016-02-25 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2235: --- Summary: Classpath discrepancy with protocol-selenium in deploy mode Key: NUTCH-2235 URL: https://issues.apache.org/jira/browse/NUTCH-2235 Project:

[jira] [Commented] (NUTCH-1712) Use MultipleInputs in Injector to make it a single mapreduce job

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167963#comment-15167963 ] Lewis John McGibbney commented on NUTCH-1712: - Is the Nutch codebase now acting off of Git? If

[jira] [Resolved] (NUTCH-1712) Use MultipleInputs in Injector to make it a single mapreduce job

2016-02-25 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1712. Resolution: Fixed Fix Version/s: 1.12 Committed to trunk (f5e430e). > Use

[jira] [Commented] (NUTCH-1712) Use MultipleInputs in Injector to make it a single mapreduce job

2016-02-25 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167935#comment-15167935 ] ASF GitHub Bot commented on NUTCH-1712: --- Github user asfgit closed the pull request at:

[GitHub] nutch pull request: NUTCH-1712 Injector to use MultipleInputs (new...

2016-02-25 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/86 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[jira] [Commented] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2016-02-25 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167626#comment-15167626 ] ASF GitHub Bot commented on NUTCH-2144: --- GitHub user thammegowda reopened a pull request:

[jira] [Commented] (NUTCH-1712) Use MultipleInputs in Injector to make it a single mapreduce job

2016-02-25 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167628#comment-15167628 ] ASF GitHub Bot commented on NUTCH-1712: --- GitHub user sebastian-nagel reopened a pull request:

[jira] [Commented] (NUTCH-2213) CommonCrawlDataDumper saves gzipped body in extracted form

2016-02-25 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167627#comment-15167627 ] ASF GitHub Bot commented on NUTCH-2213: --- GitHub user jnioche reopened a pull request:

[GitHub] nutch pull request: NUTCH-1712 Injector to use MultipleInputs (new...

2016-02-25 Thread sebastian-nagel
GitHub user sebastian-nagel reopened a pull request: https://github.com/apache/nutch/pull/86 NUTCH-1712 Injector to use MultipleInputs (new MR API) Tested inject in combination with other CrawlDb tools (readdb, updatedb, mergedb): everything seems to work smoothly, although output

[GitHub] nutch pull request: NUTCH-2213 : do not store the headers verbatim...

2016-02-25 Thread jnioche
GitHub user jnioche reopened a pull request: https://github.com/apache/nutch/pull/88 NUTCH-2213 : do not store the headers verbatim if the response was compressed See discussion on [https://issues.apache.org/jira/browse/NUTCH-2213]. You can merge this pull request into a Git

[GitHub] nutch pull request: NUTCH-2144 : override db.ignore.external to ex...

2016-02-25 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/89 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[jira] [Commented] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2016-02-25 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167600#comment-15167600 ] ASF GitHub Bot commented on NUTCH-2144: --- Github user asfgit closed the pull request at:

[jira] [Commented] (NUTCH-2213) CommonCrawlDataDumper saves gzipped body in extracted form

2016-02-25 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167602#comment-15167602 ] ASF GitHub Bot commented on NUTCH-2213: --- Github user asfgit closed the pull request at:

[GitHub] nutch pull request: NUTCH-1712 Injector to use MultipleInputs (new...

2016-02-25 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/86 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[jira] [Commented] (NUTCH-2231) Jexl support in generator job

2016-02-25 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167547#comment-15167547 ] Hudson commented on NUTCH-2231: --- SUCCESS: Integrated in Nutch-trunk #3356 (See

Jenkins build is back to normal : Nutch-trunk #3356

2016-02-25 Thread Apache Jenkins Server
See

[jira] [Commented] (NUTCH-2231) Jexl support in generator job

2016-02-25 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167409#comment-15167409 ] Markus Jelsma commented on NUTCH-2231: -- Proper null check. Committed to trunk revision 1732332. >

[jira] [Reopened] (NUTCH-2231) Jexl support in generator job

2016-02-25 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reopened NUTCH-2231: -- If no expression is set, an error is logged which shouldn't. > Jexl support in generator job >

[jira] [Comment Edited] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167309#comment-15167309 ] Lewis John McGibbney edited comment on NUTCH- at 2/25/16 3:21 PM: --

[jira] [Comment Edited] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167309#comment-15167309 ] Lewis John McGibbney edited comment on NUTCH- at 2/25/16 3:20 PM: --

[jira] [Commented] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-02-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167309#comment-15167309 ] Lewis John McGibbney commented on NUTCH-: - We need to step through crawl steps and find