[jira] [Commented] (NUTCH-2111) Set temporary file location for selenium tmp files

2015-09-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904070#comment-14904070 ] Lewis John McGibbney commented on NUTCH-2111: - Hi [~kwhitehall] bq. The patch

Tutorial : Index the web with AWS CloudSearch

2015-09-23 Thread Julien Nioche
Hi everyone, Just to let you know that we've just published a new tutorial on how to use Nutch (and StormCrawler) to crawl and index documents into AWS CloudSearch. This is related to the recent addition of NUTCH-1517 in the trunk codebase. The t

Re: Tutorial : Index the web with AWS CloudSearch

2015-09-23 Thread Sebastian Nagel
Great! Reads well, straight-forward, and I didn't find any missing detail! Thanks, Julien! 2015-09-23 11:26 GMT+02:00 Julien Nioche : > Hi everyone, > > Just to let you know that we've just published a new tutorial on how to use > Nutch (and StormCrawler) to crawl and index documents into AWS Cl

Webcast : Apache Nutch on EMR

2015-09-23 Thread Julien Nioche
Hi again, I have uploaded at webcast explaining how to run Nutch on AWS Elastic Map Reduce https://www.youtube.com/watch?v=v9zjcTjjjyU Please excuse the sound quality, hesitations and stuttering. I hope you find it useful nonetheless. Julien -- *Open Source Solutions for Text Engineering* h

[Nutch Wiki] Update of "CommonCrawlDataDumper" by JorgeLuis

2015-09-23 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "CommonCrawlDataDumper" page has been changed by JorgeLuis: https://wiki.apache.org/nutch/CommonCrawlDataDumper?action=diff&rev1=3&rev2=4 Comment: Adding information about NUTCH-2102

RE: Webcast : Apache Nutch on EMR

2015-09-23 Thread Markus Jelsma
Very cool! This is probably going to be useful. -Original message- From: Julien Nioche Sent: Wednesday 23rd September 2015 16:35 To: u...@nutch.apache.org; dev@nutch.apache.org Subject: Webcast : Apache Nutch on EMR Hi again, I have uploaded at webcast explaining how to run Nutch on A

Re: Webcast : Apache Nutch on EMR

2015-09-23 Thread Mattmann, Chris A (3980)
Thanks Julien, great work ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.matt

[GitHub] nutch pull request: fix for NUTCH-2111 contributed by kwhitehall

2015-09-23 Thread kwhitehall
GitHub user kwhitehall opened a pull request: https://github.com/apache/nutch/pull/64 fix for NUTCH-2111 contributed by kwhitehall Further investigation showed that changing the temporary path does not get rid of the tmp files that eat up space. Further, if a selenium grid is utili

[jira] [Commented] (NUTCH-2111) Set temporary file location for selenium tmp files

2015-09-23 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904727#comment-14904727 ] ASF GitHub Bot commented on NUTCH-2111: --- GitHub user kwhitehall opened a pull reques

[jira] [Assigned] (NUTCH-2111) Delete temporary files location for selenium tmp files after driver quits

2015-09-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2111: --- Assignee: Lewis John McGibbney > Delete temporary files location for selenium

[jira] [Updated] (NUTCH-2111) Delete temporary files location for selenium tmp files after driver quits

2015-09-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2111: Summary: Delete temporary files location for selenium tmp files after driver quits

[jira] [Updated] (NUTCH-2111) Delete temporary files location for selenium tmp files after driver quits

2015-09-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2111: Assignee: Kim Whitehall (was: Lewis John McGibbney) > Delete temporary files locati

[jira] [Resolved] (NUTCH-2111) Delete temporary files location for selenium tmp files after driver quits

2015-09-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2111. - Resolution: Fixed Committed @revision 1704896 in trunk > Delete temporary files l

[jira] [Commented] (NUTCH-2111) Delete temporary files location for selenium tmp files after driver quits

2015-09-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904912#comment-14904912 ] Hudson commented on NUTCH-2111: --- SUCCESS: Integrated in Nutch-trunk #3280 (See [https://bui

[GitHub] nutch pull request: fix for NUTCH-2111 contributed by kwhitehall

2015-09-23 Thread kwhitehall
Github user kwhitehall closed the pull request at: https://github.com/apache/nutch/pull/64 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is en

[jira] [Commented] (NUTCH-2111) Delete temporary files location for selenium tmp files after driver quits

2015-09-23 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905032#comment-14905032 ] ASF GitHub Bot commented on NUTCH-2111: --- Github user kwhitehall closed the pull requ

[jira] [Created] (NUTCH-2115) Add total counts to dump stats

2015-09-23 Thread Michael Joyce (JIRA)
Michael Joyce created NUTCH-2115: Summary: Add total counts to dump stats Key: NUTCH-2115 URL: https://issues.apache.org/jira/browse/NUTCH-2115 Project: Nutch Issue Type: Improvement

[GitHub] nutch pull request: NUTCH-2115 - Add total counts to mimetype stat...

2015-09-23 Thread MJJoyce
GitHub user MJJoyce opened a pull request: https://github.com/apache/nutch/pull/65 NUTCH-2115 - Add total counts to mimetype stats You can merge this pull request into a Git repository by running: $ git pull https://github.com/MJJoyce/nutch NUTCH-2115 Alternatively you can re

[jira] [Commented] (NUTCH-2115) Add total counts to dump stats

2015-09-23 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905107#comment-14905107 ] ASF GitHub Bot commented on NUTCH-2115: --- GitHub user MJJoyce opened a pull request:

[jira] [Created] (NUTCH-2116) NutchServer and NutchApp should contain shutdown hooks

2015-09-23 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2116: --- Summary: NutchServer and NutchApp should contain shutdown hooks Key: NUTCH-2116 URL: https://issues.apache.org/jira/browse/NUTCH-2116 Project: Nutch

[jira] [Commented] (NUTCH-2115) Add total counts to dump stats

2015-09-23 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905134#comment-14905134 ] ASF GitHub Bot commented on NUTCH-2115: --- Github user asfgit closed the pull request

[jira] [Resolved] (NUTCH-2115) Add total counts to dump stats

2015-09-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2115. - Resolution: Fixed Assignee: Michael Joyce Nice patch Mike Committed revision

Nutch datasets : How to ??

2015-09-23 Thread Charan Shampur
Hello team, I am new to working with nutch. I had a task of extracting the different image mime sub types and image urls by Crawling through a list of urls - my approach for this task is as below : a) for Image URLS : Aftter crawling with nutch, Use nutchpy sequence reader to read from the segm

[GitHub] nutch pull request: NUTCH-2115 - Add total counts to mimetype stat...

2015-09-23 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/65 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enable

[jira] [Commented] (NUTCH-2115) Add total counts to dump stats

2015-09-23 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905156#comment-14905156 ] Michael Joyce commented on NUTCH-2115: -- Cheers [~lewismc], thanks for the quick merge

[jira] [Commented] (NUTCH-2115) Add total counts to dump stats

2015-09-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905236#comment-14905236 ] Hudson commented on NUTCH-2115: --- SUCCESS: Integrated in Nutch-trunk #3281 (See [https://bui

[jira] [Created] (NUTCH-2117) NutchServer CLI Option for CMD_PORT is incorrect and should be CMD_HOST

2015-09-23 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2117: --- Summary: NutchServer CLI Option for CMD_PORT is incorrect and should be CMD_HOST Key: NUTCH-2117 URL: https://issues.apache.org/jira/browse/NUTCH-2117 P

[GitHub] nutch pull request: Update NutchServer.java

2015-09-23 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/63 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enable

[jira] [Resolved] (NUTCH-2117) NutchServer CLI Option for CMD_PORT is incorrect and should be CMD_HOST

2015-09-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2117. - Resolution: Fixed Committed @revision 1704972 in trunk. > NutchServer CLI Option

[jira] [Commented] (NUTCH-2117) NutchServer CLI Option for CMD_PORT is incorrect and should be CMD_HOST

2015-09-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905599#comment-14905599 ] Hudson commented on NUTCH-2117: --- FAILURE: Integrated in Nutch-trunk #3282 (See [https://bui

Build failed in Jenkins: Nutch-trunk #3282

2015-09-23 Thread Apache Jenkins Server
See Changes: [lewismc] NUTCH-2117 NutchServer CLI Option for CMD_PORT is incorrect and should be CMD_HOST this closes #63 -- [...truncated 15038 lines...] clean-lib: resolve-default: [ivy:resolve]

Re: [VOTE] Release Apache Nutch 2.3.1

2015-09-23 Thread Lewis John Mcgibbney
Hi Folks, It turns out the formatting for the original email below was terrible. Sorry about that. I've hopefully corrected formatting now. Please VOTE away! On Tue, Sep 22, 2015 at 6:45 PM, Lewis John Mcgibbney < lewis.mcgibb...@gmail.com> wrote: > Hi user@ & dev@, > > This thread is a VOTE for

Jenkins build is back to normal : Nutch-trunk #3283

2015-09-23 Thread Apache Jenkins Server
See