Sami Siren wrote:
should we introduce a new package for these: NutchConfigurable,
NutchConfigured and the upcoming action classes -
I've added these in util in the mapred branch and will use them as I
rewrite tools to use MapReduce. I'll commit them soon.
Doug
--
should we introduce a new package for these: NutchConfigurable,
NutchConfigured and the upcoming action classes -
org.apache.nutch.action ?
--
Sami Siren
Stefan Groschupf wrote:
Hi,
Doug, can you or someone else please commit the classes you suggested, I
think most / all agree and we can start
Stephan,
I already started some tests on using cli2. CLI v. 1 is in my opinion
not supporting al required parameters.
Can you please be more specific?
I defined a interface "Tool" and created a AbstractTool class.
Currently i started to change the existing tools to be extended from
them.
May be
Hi,
Doug, can you or someone else please commit the classes you suggested,
I think most / all agree and we can start porting things, but if all
people create now own NutchConfigurable interfaces we will run in
trouble and people are unhappy when they need to correct patches they
submitted or pa
apache.org
Betreff: [Nutch-dev] Re: tools cleanup
http://jakarta.apache.org/commons/cli/
could this be the way?
--
Sami Siren
John X wrote:
> On Wed, Mar 30, 2005 at 12:53:24PM -0800, Doug Cutting wrote:
>
>>2. A tool class should define no methods other than a main() and perhaps
>
http://jakarta.apache.org/commons/cli/
could this be the way?
--
Sami Siren
John X wrote:
On Wed, Mar 30, 2005 at 12:53:24PM -0800, Doug Cutting wrote:
2. A tool class should define no methods other than a main() and perhaps
those required to parse the command line. All application logic should
+1
--
Sami Siren
Doug Cutting wrote:
I propose we cleanup Nutch's tools as follows.
First, some definitions:
1. An "action" is an operation on Nutch data. For example,
GenerateSegmentFromDB, FetchSegment, UpdateDB, IndexSegment,
MergeIndexes, SearchServer, etc. are all actions.
2. A "tool" inv
Andrzej Bialecki wrote:
This also nicely solves the non-obvious requirement that all ndfs paths
must begin with a slash...
I fixed that a while back. Things that don't start with a slash are
currently made relative to /user/$USER.
Doug
---
Thi
Doug,
The proposal:
1. Actions and tools should be separate classes, in separate files.
Wonderful! :-) That will make a set of things (e.g. run nutch in a
container) very easy.
3. All actions must implement the following interface:
Inversion of control makes a lot of sense!
5. All plugins must imp
John X wrote:
On Thu, Mar 31, 2005 at 12:45:39AM +0200, Stefan Groschupf wrote:
Actually it is difficult to have tools using ndfs and local file system.
What do people think about introducing a ndfs notation in paths like it
is used in protocol handlers? (ala http:// or file://)
I don't mean to wr
I second this. But it would still be useful to keep the current NDFS
config entries. This is because if these URI's become the main method
of using ndfs, they could end up in a lot of scripts users write. Then
it would be inconvenient to change the namenode. Maybe we could use
ndfs:///path (three s
Doug Cutting wrote:
The proposal:
One more:
7. No code should call NutchConf.get() except a tool's main().
Doug
---
This SF.net email is sponsored by Demarc:
A global provider of Threat Management Solutions.
Download our HomeAdmin security softwa
Hi Doug,
> 1. An "action" is an operation on Nutch data. For example,
> GenerateSegmentFromDB, FetchSegment, UpdateDB, IndexSegment,
> MergeIndexes, SearchServer, etc. are all actions.
>
> 2. A "tool" invokes an action from the command line.
>
> The proposal:
>
> 1. Actions and tools should be
On Thu, Mar 31, 2005 at 12:45:39AM +0200, Stefan Groschupf wrote:
>
> Actually it is difficult to have tools using ndfs and local file system.
> What do people think about introducing a ndfs notation in paths like it
> is used in protocol handlers? (ala http:// or file://)
> I don't mean to write
On Wed, Mar 30, 2005 at 12:53:24PM -0800, Doug Cutting wrote:
>
> 2. A tool class should define no methods other than a main() and perhaps
> those required to parse the command line. All application logic should
> be in the action class.
>
I think command line options should be processed unif
> > The proposal:
Not really the same subject, but I was thinking about this since a while:
Isn't it time to split Nutch into modules? For instance:
* A Core module
* A Util module
* An API module
* A Plugin module
with the dependencies:
1. Core depends on API (it uses APIs to call plugins) and Ut
16 matches
Mail list logo