[Nutch-dev] NullPointerException parsing plugin.xml

2005-06-14 Thread Howie Wang
Hi, I just downloaded the nightly from 6/13 and I was getting NullPointerExceptions loading up the plugins. This occurred only during searches, not during parsing/indexing. I tracked it down a little and it's occuring in PluginManifestParser.parseLibraries. The lines of interest are:

[Nutch-dev] Re: Crawling method control !!

2005-06-14 Thread Daniel D.
Gents, I have one more question. Hope anyone will response!! The whole-web crawling tutorial advices to use the following command sequence: *Fetch*** *updatedb db* and then *generate db segments -topN 1000* Use of the topN parameter implies that *updatedb db *doing some analysis on fetc

[Nutch-dev] Re: Multi-Lingual support

2005-06-14 Thread Stefan Groschupf
Sorry for the wasted time. You didn't waste any time. I will update my proposal so that it uses the plugin system. Great! Thanks. Stefan

[Nutch-dev] Re: Multi-Lingual support

2005-06-14 Thread Jérôme Charron
> > I do not clearly understand the question. > If you mean if you can call a set of extension that imlement the same > extension point in a ordered way, the answer is sure. Sorry Stefan, now that I have a better understood of the plugin system, I realize that my question was really a dummy quest

[Nutch-dev] Re: Multi-Lingual support

2005-06-14 Thread Stefan Groschupf
But could you please answer to these quick and simple questions: * Is there a way to specifiy an ordering of plugins calls (it is needed in order to perform analyzis after language identification) ? I do not clearly understand the question. If you mean if you can call a set of extension that

[Nutch-dev] Re: Multi-Lingual support

2005-06-14 Thread Andy Liu
One option is to follow the same pattern as the parse-* plugins. Within ParserFactory's getParse() method, there is a conditional statement that chooses which parse implementation to use based on the contentType or suffix of the resource fetched. You could do the same in NutchAnalyzer factory, ex

[Nutch-dev] Sort by outlinks

2005-06-14 Thread Massimo Miccoli
Dear Nutch Dev, Ther's a way to penalize sites with many outlinks? I mean sorting hits in results pages by outlinks and penalize in rank sites with many outlinks. Any help? Thanks, Massimo --- This SF.Net email is sponsored by: NEC IT G

[Nutch-dev] Re: Multi-Lingual support

2005-06-14 Thread Andy Liu
One option is to follow the same pattern as the parse-* plugins. Within ParserFactory's getParse() method, there is a conditional statement that chooses which parse implementation to use based on the contentType or suffix of the resource fetched. You could do the same in NutchAnalyzer factory, exc

[Nutch-dev] Re: Sort by outlinks

2005-06-14 Thread Andy Liu
Sure. In IndexSegment's makeDocument() method, you can edit the code to deboost the document by checking parse.getData().getOutlinks().length and multiplying boost by some factor . You would have to reindex for this change to take effect. On 6/14/05, Massimo Miccoli <[EMAIL PROTECTED]> wrote: >

[Nutch-dev] [jira] Kommentiert: (NUTCH-21) parser plugin for MS PowerPoint slides

2005-06-14 Thread Stephan Strittmatter (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-21?page=comments#action_12313578 ] Stephan Strittmatter commented on NUTCH-21: --- Could someone send me a ppt-file which produces such errors for debugging? I tested several files but I was not abel to rep

[Nutch-dev] Re: Multi-Lingual support

2005-06-14 Thread Jérôme Charron
> There was a interesting sentence what is necessary to became a > commiter on the old home page. Excuse me Stefan, but what do you mean? > Anyway no problem, > best thing you can do is to read code and the old sourceforge mail > archive. > An article can be found here: > http://www.media-style

[Nutch-dev] Re: Multi-Lingual support

2005-06-14 Thread Stefan Groschupf
There was a interesting sentence what is necessary to became a commiter on the old home page. Anyway no problem, best thing you can do is to read code and the old sourceforge mail archive. An article can be found here: http://www.media-style.com/index.jsp?folderPK=422&action=&; At least it

[Nutch-dev] Re: Multi-Lingual support

2005-06-14 Thread Jack Tang
Hi Jérôme >From my eye, I think Nutch plugin architecture has some ideas the same with eclipse. Am I right? If yes, maybe you can refer to this article: http://www.eclipse.org/articles/Article-Plug-in-architecture/plugin_architecture.html /Jack On 6/14/05, Jérôme Charron <[EMAIL PROTECTED]> wrot

[Nutch-dev] Re: Can Nutch index over 90G html pages ?

2005-06-14 Thread Christophe Noel
Wouldn't it be simply the number of threads that you use to fetch the pages ? Doug Cutting wrote: The latest code in SVN requires less RAM. If you still have problems, try setting the config option io.map.index.skip to 8, and indexer.termIndexInterval to 1024. These will both cause less RAM

[Nutch-dev] Re: Multi-Lingual support

2005-06-14 Thread Jérôme Charron
> This is a typical use case for nutch plugins and go a step back to > configuration file based dynamic class-loading makes from my point of > view less sense. > Now after most things was ported to plugins using > AnalyzerMap.properties isn't in the style the rest of nutch is > implemented. Stefa