[jira] [Commented] (NUTCH-1852) Runtime error on Hadoop 2.4.0 caused by hadoop-core

2014-09-24 Thread Talat UYARER (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147479#comment-14147479 ] Talat UYARER commented on NUTCH-1852: - Hi [~dobromyslov], Now We do not support Hadoo

[jira] [Closed] (NUTCH-1852) Runtime error on Hadoop 2.4.0 caused by hadoop-core

2014-09-24 Thread Talat UYARER (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Talat UYARER closed NUTCH-1852. --- Resolution: Invalid > Runtime error on Hadoop 2.4.0 caused by hadoop-core > --

[jira] [Created] (NUTCH-1855) Upgrade Hadoop dependencies to Hadoop 2

2014-09-24 Thread Talat UYARER (JIRA)
Talat UYARER created NUTCH-1855: --- Summary: Upgrade Hadoop dependencies to Hadoop 2 Key: NUTCH-1855 URL: https://issues.apache.org/jira/browse/NUTCH-1855 Project: Nutch Issue Type: Improvement

[jira] [Commented] (NUTCH-1844) testresources/testcrawl not referenced anywhere in code

2014-09-24 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147462#comment-14147462 ] Hudson commented on NUTCH-1844: --- FAILURE: Integrated in Nutch-trunk #2795 (See [https://bui

Build failed in Jenkins: Nutch-trunk #2795

2014-09-24 Thread Apache Jenkins Server
See Changes: [mattmann] Updated CHANGES.txt for NUTCH-1844. [mattmann] fix for NUTCH-1844. -- [...truncated 4947 lines...] clean-lib: resolve-default: [ivy:resolve] :: loading settings :: file = <

[jira] [Resolved] (NUTCH-1844) testresources/testcrawl not referenced anywhere in code

2014-09-24 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-1844. -- Resolution: Fixed - fixed in r1627455 > testresources/testcrawl not referenced anywhere

[jira] [Commented] (NUTCH-1660) Index filter for Page's latitude and longitude

2014-09-24 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147281#comment-14147281 ] Chris A. Mattmann commented on NUTCH-1660: -- Guys, see http://github.com/chrismatt

[jira] [Assigned] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher

2014-09-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-1854: --- Assignee: Lewis John McGibbney > ./bin/crawl fails with a parsing fetcher > -

[jira] [Created] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher

2014-09-24 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1854: --- Summary: ./bin/crawl fails with a parsing fetcher Key: NUTCH-1854 URL: https://issues.apache.org/jira/browse/NUTCH-1854 Project: Nutch Issue Ty

[Nutch Wiki] Update of "ContributorsGroup" by LewisJohnMcgibbney

2014-09-24 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "ContributorsGroup" page has been changed by LewisJohnMcgibbney: https://wiki.apache.org/nutch/ContributorsGroup?action=diff&rev1=14&rev2=15 * riverma * JorgeLuis * ArthurCi

[jira] [Updated] (NUTCH-978) A Plugin for extracting certain element of a web page on html page parsing.

2014-09-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-978: --- Fix Version/s: 1.10 > A Plugin for extracting certain element of a web page on html pag

[jira] [Updated] (NUTCH-1843) Upgrade to Gora 0.5

2014-09-24 Thread Talat UYARER (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Talat UYARER updated NUTCH-1843: Attachment: NUTCH-1843.patch Hi [~lewismc], I test with Hbase. There does not seem any problem. I h

Re: Making life easier with Oozie?

2014-09-24 Thread Mattmann, Chris A (3980)
Hi Edoardo, Any contribution would be welcome and I am happy to figure out how to shepherd things in. Thank you for considering contributing to Nutch and I for one would appreciate anything you are interested in giving. I would keep both the crawl script and your Oozie based version - so it would

Re: Making life easier with Oozie?

2014-09-24 Thread Edoardo Causarano
Hi Sebastian, take org.apache.nutch.crawl.Generator for example. The way the class is written, the main method will pass the command line options to the actual run method and if crawldb and segments paths are missing it will bail out. Other command line parameters are parsed in the run method to p

Re: Making life easier with Oozie?

2014-09-24 Thread Sebastian Nagel
Hi Edoardo, > To make things easy I've used the JavaMain action to execute the classes > that the nutch scripts invokes, parametrized as necessary. Ok. That means that each step (inject, generate, fetch, etc.) runs in its own JVM. Right? > One thing that I noticed is that I found configuring the

RE: DOCUMENTATION - Nutch and Hidden Services

2014-09-24 Thread Markus Jelsma
Hi - this is really awesome! Is there also a way to use different exit nodes for different fetchers or queues, or can you instruct to regularly change exit nodes? Markus -Original message- From: Lewis John Mcgibbney Sent: Wednesday 24th September 2014 4:57 To: u...@nutch.apache.org; dev@

Making life easier with Oozie?

2014-09-24 Thread Edoardo Causarano
Hi all, I've been busy lately with a Nutch 1.x setup and I've managed to replicate the crawl script into an Oozie workflow (and HUE for pretty web UI). To make things easy I've used the JavaMain action to execute the classes that the nutch scripts invokes, parametrized as necessary. One thing tha