Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

2011-08-10 Thread Julien Nioche
Hi Tom, I have been using Nutch 1.x for the last 9 months or so and it works well for large scale crawls up to around a billion pages. However, the inherent lack of random access in HDFS really starts to become a burden on our hadoop cluster when going through the whole

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

2011-08-10 Thread Markus Jelsma
build for the stable branches). The real issue behind all this is what we should do with Nutch 2.0. What follows is only my opinion and I would love to hear what others have to say on this subject. Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage to Gora, the latter

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

2011-08-10 Thread lewis john mcgibbney
to adoption for dev's. This being said, Gora is a fundamental component for Nutch 2.0 and once you get to grips with the config and the flexibility which it offers you are then presented with an excellent setup for Nutch 2.0. I understand people's concerns and why they would wish to hardwire

Nutch 2.0 DOAP

2011-08-10 Thread lewis john mcgibbney
Hi, Just for information purposes, I committed our DOAP which can now be found under trunk svn. I have been informed by site-dev@ that the system they use oes not support more than one doap file, however I thought it best to keep it in svn for the time being. If at some point in the future Nutch

Re: Nutch 2.0 DOAP

2011-08-10 Thread Julien Nioche
file, however I thought it best to keep it in svn for the time being. If at some point in the future Nutch 2.0 becomes the de facto Nutch release then no-one will need to recreate one. Thanks -- *Lewis* -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com

Re: Nutch 2.0 Documentation

2011-08-09 Thread Markus Jelsma
the project. I would really like to push to get this going as per [1] as I have been trying to get various documentation updated over the last while. This would be a reasonable milestone which would carve the way for a fully documented Nutch 2.0 (and branch 1.4) ;0) Would it be possible for me

Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

2011-08-09 Thread Julien Nioche
and it doesn't bother anybody that it fails all the time (and that there isn't a nightly build for the stable branches). The real issue behind all this is what we should do with Nutch 2.0. What follows is only my opinion and I would love to hear what others have to say on this subject. Since we (actually

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

2011-08-09 Thread Kirby Bohling
to it but it does not seem to be used much and there is virtually nothing happening on it in terms of development. More worryingly, the people who initially contributed to it are not very active on the project (such is life, new jobs, different projects, etc...) anymore·. As for Nutch 2.0, it hasn't made any

[jira] [Commented] (NUTCH-849) different versions of the same library in nutch-2.0-dev.job and local\lib directory

2011-08-09 Thread Lewis John McGibbney (JIRA)
from svn and after compiling checked all jar files in runtime/deploy/nutch-2.0-dev.job and /runtime/local/lib. All jar files in both libraries are identical and versions are consistent therefore I propose we close this issue as fixed. Perhaps someone committed a change and didn't realise

[jira] [Commented] (NUTCH-849) different versions of the same library in nutch-2.0-dev.job and local\lib directory

2011-08-09 Thread Markus Jelsma (JIRA)
. ant report only throws alot of {code} [ivy:resolve] unknown resolver maven2 {code} messages. different versions of the same library in nutch-2.0-dev.job and local\lib directory Key: NUTCH

RE: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

2011-08-09 Thread Tom Davidson
...@gmail.com] Sent: Tuesday, August 09, 2011 8:31 AM To: dev@nutch.apache.org Cc: gora-...@incubator.apache.org Subject: Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk] Julien, On Tue, Aug 9, 2011 at 10:10 AM, Julien Nioche lists.digitalpeb

Nutch 2.0 Documentation

2011-08-04 Thread lewis john mcgibbney
the last while. This would be a reasonable milestone which would carve the way for a fully documented Nutch 2.0 (and branch 1.4) ;0) Would it be possible for me to invoke a small conversation on this topic to gather thoughts as it seems this issue has been forgotten about again. Thank you [1] https

Re: Nutch 2.0 roadmap

2011-07-04 Thread Julien Nioche
Hi Lewis, Currently the slightly (in places) dated roadmap can be found here [1], I was wondering if we could give this an overhaul/update as it would give a more robust overview of where trunk is going. Most of the points you make are still in development, however some have been achieved and

Nutch 2.0 roadmap

2011-07-01 Thread lewis john mcgibbney
to release this year moving forward it is essential that this is seen to. N.B. I moved to old Nutch 2.0 road map to the legacy and archive section of the wiki in an attempt to disambiguate data and future intentions. Thanks [1] http://wiki.apache.org/nutch/Nutch2Roadmap -- *Lewis*

Building Nutch 2.0 from the trunk

2011-06-22 Thread Nutch User - 1
Could someone give me step-by-step instructions on how to build Nutch 2.0 from the trunk and run it? I tried to follow this (http://techvineyard.blogspot.com/2010/12/build-nutch-20.html), but failed to do so as described here (http://lucene.472066.n3.nabble.com/TestFetcher-hangs-td3091057.html).

Re: Does Nutch 2.0 in good enough shape to test?

2010-12-18 Thread Alexis
I've spent some time working on this as well. I've just put together a blog entry addressing the issues I ran into. See http://techvineyard.blogspot.com/2010/12/build-nutch-20.html This is a great howto for Nutch 2.0. Feel free to link to it from the Wiki, this could be useful to others

Re: Does Nutch 2.0 in good enough shape to test?

2010-12-17 Thread Andrzej Bialecki
(switching to devs) On 12/17/10 10:18 AM, Alexis wrote: Hi, I've spent some time working on this as well. I've just put together a blog entry addressing the issues I ran into. See http://techvineyard.blogspot.com/2010/12/build-nutch-20.html In a nutchsell, I changed three pieces in Gora and

RE: Does Nutch 2.0 in good enough shape to test?

2010-12-17 Thread brad
: 1.0 TOTAL urls: 2894 status 0 (null):2894 avg score: 1.0 -Original Message- From: Andrzej Bialecki [mailto:a...@getopt.org] Sent: Thursday, December 16, 2010 11:36 PM To: u...@nutch.apache.org Subject: Re: Does Nutch 2.0 in good enough shape to test? On 12/17/10

Re: Nutch 2.0 Help

2010-09-08 Thread Julien Nioche
Hi guys, I've summarized the steps to follow for having GORA+Hbase with Nutch 2.0 on http://wiki.apache.org/nutch/GORA_HBase Feel free to amend and improve as you see fit. Please bear in mind that Nutch 2.0 is at a very early stage and is far from being bug-proof, see in particular [1]. HTH

Re: Nutch 2.0 Help

2010-09-08 Thread Enis Soztutar
a issue to track this down. Cheers, Enis On Wed, Sep 8, 2010 at 1:53 PM, Julien Nioche lists.digitalpeb...@gmail.com wrote: Hi guys, I've summarized the steps to follow for having GORA+Hbase with Nutch 2.0 on http://wiki.apache.org/nutch/GORA_HBase Feel free to amend and improve as you see

nutch 2.0 (trunk)

2010-09-07 Thread Faruk Berksöz
or nothing? environments : ubuntu 10.04 JVM : 1.6.0_20 nutch 2.0 (trunk) Mysql/HBase (0.20.6) / Hadoop(0.20.2) pseudo-distributed Best regards, Faruk Berksöz

Re: nutch 2.0 (trunk)

2010-09-07 Thread Andrzej Bialecki
. Should I file this in nutch-jira or hithub/gora or nothing? environments : ubuntu 10.04 JVM : 1.6.0_20 nutch 2.0 (trunk) Mysql/HBase (0.20.6) / Hadoop(0.20.2) pseudo-distributed Yes, please create a JIRA issue. Thanks! -- Best regards, Andrzej Bialecki

Re: nutch 2.0 (trunk)

2010-09-07 Thread Julien Nioche
:408) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) The type of the column 'content' is BLOB. It may be important for the next developments of Gora. Should I file this in nutch-jira or hithub/gora or nothing? environments : ubuntu 10.04 JVM : 1.6.0_20 nutch

Nutch 2.0 Help

2010-09-02 Thread David Stuart
Hey All, I have setup the latest version nutch from trunk and am running into a few issues with hbase and injecting urls. when I run the command runtime/local/bin/nutch inject runtime/local/seed/ I get InjectorJob: java.lang.RuntimeException: Could not create datastore at

Re: Nutch 2.0 Help

2010-09-02 Thread Julien Nioche
Hi David, I haven't used the Hbase backend with GORA for quite some time but from what I can remember you'll need the following things : * conf/hbase-site.xml = this should correspond to your local configuration * conf/gora-hbase-mapping.xml = see below * conf/gora.properties = don't think there

[jira] Commented: (NUTCH-874) Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora

2010-08-09 Thread Julien Nioche (JIRA)
instead of maintaining ours. WDYT? Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora -- Key: NUTCH-874 URL: https://issues.apache.org/jira/browse/NUTCH-874

[jira] Created: (NUTCH-875) Port Webgraph to Nutch 2.0

2010-08-09 Thread Julien Nioche (JIRA)
Port Webgraph to Nutch 2.0 -- Key: NUTCH-875 URL: https://issues.apache.org/jira/browse/NUTCH-875 Project: Nutch Issue Type: New Feature Components: linkdb Affects Versions: 2.1 Reporter

[jira] Commented: (NUTCH-874) Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora

2010-08-09 Thread Chris A. Mattmann (JIRA)
that, I think we're good! Cheers, Chris Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora -- Key: NUTCH-874 URL: https://issues.apache.org/jira/browse/NUTCH-874

[jira] Created: (NUTCH-874) Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora

2010-08-08 Thread Chris A. Mattmann (JIRA)
Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora -- Key: NUTCH-874 URL: https://issues.apache.org/jira/browse/NUTCH-874 Project: Nutch Issue

[jira] Commented: (NUTCH-849) different versions of the same library in nutch-2.0-dev.job and local\lib directory

2010-07-14 Thread Julien Nioche (JIRA)
the dependencies managed by Ivy. This will create a file build/org.apache.nutch-Nutch-test.html with all the details different versions of the same library in nutch-2.0-dev.job and local\lib directory

[jira] Created: (NUTCH-849) different versions of the same library in nutch-2.0-dev.job and local\lib directory

2010-07-12 Thread Pham Tuan Minh (JIRA)
different versions of the same library in nutch-2.0-dev.job and local\lib directory Key: NUTCH-849 URL: https://issues.apache.org/jira/browse/NUTCH-849 Project

Re: Nutch 2.0 : Design issue

2010-07-02 Thread Julien Nioche
segments to the webtable. The drawbacks being that there would be a dual storage GORA / HDFS and we'd need to keep the legacy Nutch Writable objects. The fetcher code is already ported in nutchbase not to use the plain files. I doubt there would be many users who want to jump to Nutch 2.0

[jira] Created: (NUTCH-841) Nutch 2.0 webapp

2010-07-02 Thread Chris A. Mattmann (JIRA)
Nutch 2.0 webapp Key: NUTCH-841 URL: https://issues.apache.org/jira/browse/NUTCH-841 Project: Nutch Issue Type: Improvement Components: web gui Environment: Nutch 2.0 Reporter: Chris

Re: Update svn nutchbase - Nutch 2.0

2010-06-30 Thread Julien Nioche
are left with an Apache Nutchbase branch that needs to incrementally be merged into the Nutch 2.0 trunk, which I agree with Andrzej, and Julien, is the most important part. So, either way works fine with me, so long as we are left with an Apache Nutchbase branch that can be merged incrementally

Where is nutch 2.0

2010-06-29 Thread Raghavendra Neelekani
Hi Can you please tell me from where can I download nutch 2.0 .? -- Raghavendra Keshava Neelekani

Re: Where is nutch 2.0

2010-06-29 Thread Andrzej Bialecki
On 2010-06-29 11:17, Raghavendra Neelekani wrote: Hi Can you please tell me from where can I download nutch 2.0 .? Nutch 2.0 is in the planning and early development phase, so it can't be downloaded yet. We hope to produce a working Nutch 2.0 some time in Q4 2010. -- Best regards, Andrzej

Re: Nutch 2.0

2010-06-29 Thread Doğacan Güney
Hi, On Tue, Jun 29, 2010 at 11:49, Julien Nioche lists.digitalpeb...@gmail.comwrote: Thanks Chris, I already shared my thoughts on this yesterday, but I still fail to see the advantage of keeping the details of the recent github nutchbase commits (some of them being just upgrades to the

Re: Nutch 2.0

2010-06-29 Thread Mattmann, Chris A (388J)
wholesale, either way, we are left with an Apache Nutchbase branch that needs to incrementally be merged into the Nutch 2.0 trunk, which I agree with Andrzej, and Julien, is the most important part. So, either way works fine with me, so long as we are left with an Apache Nutchbase branch that can be merged

Update svn nutchbase - Nutch 2.0

2010-06-29 Thread Julien Nioche
branch or we blow away the Apache Nutchbase branch and then import the Github Nutchbase branch wholesale, either way, we are left with an Apache Nutchbase branch that needs to incrementally be merged into the Nutch 2.0 trunk, which I agree with Andrzej, and Julien, is the most important part

Re: Nutch 2.0

2010-06-28 Thread Andrzej Bialecki
On 2010-06-28 07:49, Sami Siren wrote: One aspect that has not been discussed yet is the legal aspect. According to http://incubator.apache.org/ip-clearance/index.html there is a formal process for integrating externally development efforts that have happened outside of Apache. Should we be

Re: Nutch 2.0

2010-06-28 Thread Sami Siren
On 06/28/2010 10:10 AM, Andrzej Bialecki wrote: On 2010-06-28 07:49, Sami Siren wrote: One aspect that has not been discussed yet is the legal aspect. According to http://incubator.apache.org/ip-clearance/index.html there is a formal process for integrating externally development efforts that

Re: Nutch 2.0

2010-06-28 Thread Doğacan Güney
issues there (h) Nutch 2.0 trunk is fixed, and brought up to speed and old code is removed. All unit tests should pass regression where it makes sense. (i) Nutch documentation is brought up to date on wiki and checked into SVN (j) We roll a 2.0 release +1 I'd be happy to do

Re: Nutch 2.0

2010-06-28 Thread Mattmann, Chris A (388J)
it up to snuff. (e) roll the version # in nutch trunk to 2.0-dev (f) all issues in JIRA should be updated to reflect 2.0-dev fixes where it makes sense (g) a 2.1 version is added to mark anything that we don't want in 2.0 and we file post 2.0 issues there (h) Nutch 2.0 trunk

Re: Nutch 2.0

2010-06-28 Thread Mattmann, Chris A (388J)
is added to mark anything that we don't want in 2.0 and we file post 2.0 issues there (h) Nutch 2.0 trunk is fixed, and brought up to speed and old code is removed. All unit tests should pass regression where it makes sense. (i) Nutch documentation is brought up to date on wiki and checked

Re: Nutch 2.0

2010-06-28 Thread Mattmann, Chris A (388J)
Okey dokey guys, (c), (e) and (g) are done. Julien, Doğacan, your turn on (a) and (d) and then we can all work on (e) and (f)... Cheers, Chris On 6/28/10 12:55 PM, Doğacan Güney doga...@gmail.com wrote: On Mon, Jun 28, 2010 at 20:23, Andrzej Bialecki a...@getopt.org wrote: On 2010-06-28

<    1   2