[Nutch-dev] Re: tools cleanup

2005-05-17 Thread Doug Cutting
Sami Siren wrote: should we introduce a new package for these: NutchConfigurable, NutchConfigured and the upcoming action classes - I've added these in util in the mapred branch and will use them as I rewrite tools to use MapReduce. I'll commit them soon. Doug --

[Nutch-dev] Re: tools cleanup

2005-05-17 Thread Sami Siren
should we introduce a new package for these: NutchConfigurable, NutchConfigured and the upcoming action classes - org.apache.nutch.action ? -- Sami Siren Stefan Groschupf wrote: Hi, Doug, can you or someone else please commit the classes you suggested, I think most / all agree and we can start

Re: AW: [Nutch-dev] Re: tools cleanup

2005-04-13 Thread Stefan Groschupf
Stephan, I already started some tests on using cli2. CLI v. 1 is in my opinion not supporting al required parameters. Can you please be more specific? I defined a interface "Tool" and created a AbstractTool class. Currently i started to change the existing tools to be extended from them. May be

[Nutch-dev] Re: tools cleanup

2005-04-13 Thread Stefan Groschupf
Hi, Doug, can you or someone else please commit the classes you suggested, I think most / all agree and we can start porting things, but if all people create now own NutchConfigurable interfaces we will run in trouble and people are unhappy when they need to correct patches they submitted or pa

AW: [Nutch-dev] Re: tools cleanup

2005-04-11 Thread Strittmatter, Stephan
apache.org Betreff: [Nutch-dev] Re: tools cleanup http://jakarta.apache.org/commons/cli/ could this be the way? -- Sami Siren John X wrote: > On Wed, Mar 30, 2005 at 12:53:24PM -0800, Doug Cutting wrote: > >>2. A tool class should define no methods other than a main() and perhaps >

[Nutch-dev] Re: tools cleanup

2005-04-09 Thread Sami Siren
http://jakarta.apache.org/commons/cli/ could this be the way? -- Sami Siren John X wrote: On Wed, Mar 30, 2005 at 12:53:24PM -0800, Doug Cutting wrote: 2. A tool class should define no methods other than a main() and perhaps those required to parse the command line. All application logic should

[Nutch-dev] Re: tools cleanup

2005-04-09 Thread Sami Siren
+1 -- Sami Siren Doug Cutting wrote: I propose we cleanup Nutch's tools as follows. First, some definitions: 1. An "action" is an operation on Nutch data. For example, GenerateSegmentFromDB, FetchSegment, UpdateDB, IndexSegment, MergeIndexes, SearchServer, etc. are all actions. 2. A "tool" inv

[Nutch-dev] Re: tools cleanup

2005-03-31 Thread Doug Cutting
Andrzej Bialecki wrote: This also nicely solves the non-obvious requirement that all ndfs paths must begin with a slash... I fixed that a while back. Things that don't start with a slash are currently made relative to /user/$USER. Doug --- Thi

[Nutch-dev] Re: tools cleanup

2005-03-31 Thread Stefan Groschupf
Doug, The proposal: 1. Actions and tools should be separate classes, in separate files. Wonderful! :-) That will make a set of things (e.g. run nutch in a container) very easy. 3. All actions must implement the following interface: Inversion of control makes a lot of sense! 5. All plugins must imp

[Nutch-dev] Re: tools cleanup

2005-03-31 Thread Andrzej Bialecki
John X wrote: On Thu, Mar 31, 2005 at 12:45:39AM +0200, Stefan Groschupf wrote: Actually it is difficult to have tools using ndfs and local file system. What do people think about introducing a ndfs notation in paths like it is used in protocol handlers? (ala http:// or file://) I don't mean to wr

[Nutch-dev] Re: tools cleanup

2005-03-31 Thread Feng Zhou
I second this. But it would still be useful to keep the current NDFS config entries. This is because if these URI's become the main method of using ndfs, they could end up in a lot of scripts users write. Then it would be inconvenient to change the namenode. Maybe we could use ndfs:///path (three s

[Nutch-dev] Re: tools cleanup

2005-03-31 Thread Doug Cutting
Doug Cutting wrote: The proposal: One more: 7. No code should call NutchConf.get() except a tool's main(). Doug --- This SF.net email is sponsored by Demarc: A global provider of Threat Management Solutions. Download our HomeAdmin security softwa

[Nutch-dev] RE: tools cleanup

2005-03-30 Thread Chris Mattmann
Hi Doug, > 1. An "action" is an operation on Nutch data. For example, > GenerateSegmentFromDB, FetchSegment, UpdateDB, IndexSegment, > MergeIndexes, SearchServer, etc. are all actions. > > 2. A "tool" invokes an action from the command line. > > The proposal: > > 1. Actions and tools should be

[Nutch-dev] Re: tools cleanup

2005-03-30 Thread John X
On Thu, Mar 31, 2005 at 12:45:39AM +0200, Stefan Groschupf wrote: > > Actually it is difficult to have tools using ndfs and local file system. > What do people think about introducing a ndfs notation in paths like it > is used in protocol handlers? (ala http:// or file://) > I don't mean to write

[Nutch-dev] Re: tools cleanup

2005-03-30 Thread John X
On Wed, Mar 30, 2005 at 12:53:24PM -0800, Doug Cutting wrote: > > 2. A tool class should define no methods other than a main() and perhaps > those required to parse the command line. All application logic should > be in the action class. > I think command line options should be processed unif

[Nutch-dev] Re: tools cleanup

2005-03-30 Thread Jérôme Charron
> > The proposal: Not really the same subject, but I was thinking about this since a while: Isn't it time to split Nutch into modules? For instance: * A Core module * A Util module * An API module * A Plugin module with the dependencies: 1. Core depends on API (it uses APIs to call plugins) and Ut