[Nutch-dev] Nutch - New Features (?)

2006-01-27 Thread Fuad Efendi
Since we have such strange plugin structure (DI? IoC?), and many utility classes with a single UNIX shell script to run everything... 1. Separate concerns. Clearly. - Crawl - Parse - Generate URL List - Crawl - ... (Interfaces of WebDB should be more clear, so we can use databases, etc,...) 1a.

[Nutch-dev] Nutch - New Features (?)

2006-01-27 Thread Fuad Efendi
Since we have such strange plugin structure (DI? IoC?), and many utility classes with a single UNIX shell script to run everything... 1. Separate concerns. Clearly. - Crawl - Parse - Generate URL List - Crawl - ... (Interfaces of WebDB should be more clear, so we can use databases, etc,...) 1a.

[Nutch-dev] Re: older Nutch list archives (@sf.net)?

2006-01-27 Thread Gordon Mohr (archive.org)
Access works now, thanks! (Search at SF.net seems flaky, though -- simple searches that bring expected results at mail-archive.com give nothing or abbrieviated results at SF.net.) Also, I misinterpreted the mail-archive.com robots.txt... it is crawlable, though neither G nor Y go very deep. It's

[Nutch-dev] Re: need volunteer to develop search for apache.org

2006-01-27 Thread John X
Hi, Stefan, On Thu, Jan 26, 2006 at 10:17:52PM +0100, Stefan Groschupf wrote: > John, > if you need any kind of support let me know. Especially I can help > out with UI related stuff, however I also can help with all other > issues. Really appreciated. With all the support from the community,

[Nutch-dev] Re: older Nutch list archives (@sf.net)?

2006-01-27 Thread Doug Cutting
Gordon Mohr (archive.org) wrote: Doug Cutting wrote: The Sourceforge archives are still there, just hard to find, e.g.: http://sourceforge.net/mailarchive/forum.php?forum=nutch-developers When I visit that URL, I get: # Permission Denied # # Access to this page is restricted (either to pro

[Nutch-dev] [jira] Updated: (NUTCH-189) Injection infinite loop

2006-01-27 Thread Bryan Pendleton (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-189?page=all ] Bryan Pendleton updated NUTCH-189: -- Attachment: textinputformat.patch.txt > Injection infinite loop > --- > > Key: NUTCH-189 > URL: http://issues.apache.or

[Nutch-dev] [jira] Commented: (NUTCH-189) Injection infinite loop

2006-01-27 Thread Bryan Pendleton (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-189?page=comments#action_12364270 ] Bryan Pendleton commented on NUTCH-189: --- I think this is caused by a similar issue I've been running into in my code, though I'm not testing crawling, so I can't be sure.

[Nutch-dev] Re: older Nutch list archives (@sf.net)?

2006-01-27 Thread Gordon Mohr (archive.org)
Doug Cutting wrote: The Sourceforge archives are still there, just hard to find, e.g.: http://sourceforge.net/mailarchive/forum.php?forum=nutch-developers When I visit that URL, I get: # Permission Denied # # Access to this page is restricted (either to project members or to # project adminis

[Nutch-dev] Re: older Nutch list archives (@sf.net)?

2006-01-27 Thread Doug Cutting
The Sourceforge archives are still there, just hard to find, e.g.: http://sourceforge.net/mailarchive/forum.php?forum=nutch-developers These lists are also archived at mail-archive.com: http://www.mail-archive.com/nutch-developers%40lists.sourceforge.net/ Doug Gordon Mohr (archive.org) wrote:

[Nutch-dev] Re: svn commit: r372810 - /lucene/nutch/trunk/bin/nutch

2006-01-27 Thread Rod Taylor
On Fri, 2006-01-27 at 13:37 -0800, Doug Cutting wrote: > Andrzej Bialecki wrote: > > #!/usr/bin/env bash > > +1 > > This works on Solaris, Linux & cygwin. Does it work on FreeBSD? Yes. It will fail on some older and obscure systems but I don't imagine those will have a JVM anyway. -- Rod Tayl

[Nutch-dev] older Nutch list archives (@sf.net)?

2006-01-27 Thread Gordon Mohr (archive.org)
The Nutch mailing lists used to hosted at Sourceforge; however recently trying to access an archived message via a link that used to work got me a "Permission Denied" message. Via the Sourceforge project page, no lists are shown: http://sourceforge.net/mail/?group_id=59548 Are these archives

[Nutch-dev] Re: svn commit: r372810 - /lucene/nutch/trunk/bin/nutch

2006-01-27 Thread Doug Cutting
Andrzej Bialecki wrote: #!/usr/bin/env bash +1 This works on Solaris, Linux & cygwin. Does it work on FreeBSD? Doug --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJ

[Nutch-dev] Re: svn commit: r372810 - /lucene/nutch/trunk/bin/nutch

2006-01-27 Thread Andrzej Bialecki
Rod Taylor wrote: The problem is that for 90% of the maintainers of this script, /bin/sh is bash, so it is hard to ensure that the use of bash features does not creep into it. Is installing bash on FreeBSD onerous? It is the It is installed but the path is /usr/local/bin/bash like any

[Nutch-dev] Re: A Nutch config editor...

2006-01-27 Thread Dominik Friedrich
I've found a way to make remote calls from the client to a jobtracker and namenode without patching any nutch code. The new package includes a small wrapper that start a namenode and a jobtracker and makes a xml-rpc interface to those two services available. See http://home.halifax.rwth-aachen.

[Nutch-dev] [jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-27 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12364242 ] Doug Cutting commented on NUTCH-139: I was confused about which was the latest version. (I deleted the older versions. Is there a way to simply mark them obsolete?) So,

[Nutch-dev] Re: need volunteer to develop search for apache.org

2006-01-27 Thread Doug Cutting
Sami Siren wrote: Will the resources (scripts, modifications, documentation etc) of this setup be publicly available I mean could this installation be something like RI for implementing nutch based search for a public web site. Yes. Anything that is not too site-specific should probably even

[Nutch-dev] Re: svn commit: r372810 - /lucene/nutch/trunk/bin/nutch

2006-01-27 Thread Rod Taylor
On Fri, 2006-01-27 at 10:34 -0800, Doug Cutting wrote: > Rod Taylor wrote: > > Please don't do that. > > > > bash-2.05b$ ls /bin/bash > > ls: /bin/bash: No such file or directory > > > > bash-2.05b$ uname -a > > FreeBSD home 6.0-RELEASE FreeBSD 6.0-RELEASE

[Nutch-dev] [jira] Updated: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-27 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=all ] Doug Cutting updated NUTCH-139: --- Attachment: (was: NUTCH-139.Mattmann.patch.txt) > Standard metadata property names in the ParseData metadata >

[Nutch-dev] [jira] Updated: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-27 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=all ] Doug Cutting updated NUTCH-139: --- Attachment: (was: NUTCH-139.jc.review.patch.txt) > Standard metadata property names in the ParseData metadata > ---

[Nutch-dev] Re: svn commit: r372810 - /lucene/nutch/trunk/bin/nutch

2006-01-27 Thread Doug Cutting
Rod Taylor wrote: Please don't do that. bash-2.05b$ ls /bin/bash ls: /bin/bash: No such file or directory bash-2.05b$ uname -a FreeBSD home 6.0-RELEASE FreeBSD 6.0-RELEASE #13: Sat Nov 5 00:19:49 EST 2005 [EMAIL PROTECTED]:/usr/obj/usr/src/

[Nutch-dev] Re: [Nutch-cvs] svn commit: r372810 - /lucene/nutch/trunk/bin/nutch

2006-01-27 Thread Doug Cutting
Andrzej Bialecki wrote: Right, Solaris /bin/sh doesn't allow that... Hmm. Does this IFS setting/unsetting work for you? I mean, I just tried it on Linux, using the real Bash. I put the nutch distrib in a path containing spaces, and I'm not able to run anything... I initially added it to make

[Nutch-dev] Re: need volunteer to develop search for apache.org

2006-01-27 Thread Sami Siren
Will the resources (scripts, modifications, documentation etc) of this setup be publicly available I mean could this installation be something like RI for implementing nutch based search for a public web site. I see this as a great opportunity to produce some generally usable stuff that would

[Nutch-dev] [jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-27 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12364218 ] Jerome Charron commented on NUTCH-139: -- > I think we're near agreement here. I really hope ... ;-) > We should add an add() method to Metadata, and change set() to repl

[Nutch-dev] Re: [jira] Commented: (NUTCH-185) XMLParser is configurable plugin. It use XPath and namespaces to do the mapping between the XML elements and Lucene fields.

2006-01-27 Thread Rida Benjelloun
Hi Philippe, Thanks, for your comments. I have already add multi-values for a field in lucene. I will try it with nutch plugin. Best regards. On 1/26/06, Philippe EUGENE (JIRA) <[EMAIL PROTECTED]> wrote: > >[ > http://issues.apache.org/jira/browse/NUTCH-185?page=comments#action_12364087]

[Nutch-dev] Re: need volunteer to develop search for apache.org

2006-01-27 Thread Rida Benjelloun
Hi Doug, I will be interested by this development. I have a lot of experience with lucene. Best regards On 1/27/06, Fuad Efendi <[EMAIL PROTECTED]> wrote: > > Hope to join! > +1 > > > -Original Message- > From: Doug Cutting [mailto:[EMAIL PROTECTED] > Sent: Wednesday, January 25, 2006 4:

[Nutch-dev] Re: svn commit: r372810 - /lucene/nutch/trunk/bin/nutch

2006-01-27 Thread Rod Taylor
Please don't do that. bash-2.05b$ ls /bin/bash ls: /bin/bash: No such file or directory bash-2.05b$ uname -a FreeBSD home 6.0-RELEASE FreeBSD 6.0-RELEASE #13: Sat Nov 5 00:19:49 EST 2005 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/HOME amd64

[Nutch-dev] Re: [Nutch-cvs] svn commit: r372810 - /lucene/nutch/trunk/bin/nutch

2006-01-27 Thread Andrzej Bialecki
Doug Cutting wrote: Andrzej Bialecki wrote: Namely? I didn't notice any ... I think it's better to avoid bash-isms, if we easily can. Not all the world looks like Linux. ;-) IFS, at least. I tried running this on Solaris, where /bin/sh is not bash, and it didn't work. It complained about un

[Nutch-dev] Re: [Nutch-cvs] svn commit: r372810 - /lucene/nutch/trunk/bin/nutch

2006-01-27 Thread Doug Cutting
Andrzej Bialecki wrote: Namely? I didn't notice any ... I think it's better to avoid bash-isms, if we easily can. Not all the world looks like Linux. ;-) IFS, at least. I tried running this on Solaris, where /bin/sh is not bash, and it didn't work. It complained about unsetting IFS. Doug

[Nutch-dev] Re: [Nutch-cvs] svn commit: r372810 - /lucene/nutch/trunk/bin/nutch

2006-01-27 Thread Andrzej Bialecki
[EMAIL PROTECTED] wrote: Author: cutting Date: Fri Jan 27 02:45:35 2006 New Revision: 372810 URL: http://svn.apache.org/viewcvs?rev=372810&view=rev Log: Explicitly specify bash, since this script requires some bash-specific features. Namely? I didn't notice any ... I think it's better to av