Hi,
I just downloaded the nightly from 6/13 and I was getting
NullPointerExceptions loading up the plugins. This occurred
only during searches, not during parsing/indexing. I tracked
it down a little and it's occuring in PluginManifestParser.parseLibraries.
The lines of interest are:
Gents,
I have one more question. Hope anyone will response!!
The whole-web crawling tutorial advices to use the following command
sequence:
*Fetch***
*updatedb db*
and then *generate db segments -topN 1000*
Use of the topN parameter implies that *updatedb db *doing some analysis on
fetc
Sorry for the wasted time.
You didn't waste any time.
I will update my proposal so that it uses the plugin system.
Great!
Thanks.
Stefan
>
> I do not clearly understand the question.
> If you mean if you can call a set of extension that imlement the same
> extension point in a ordered way, the answer is sure.
Sorry Stefan, now that I have a better understood of the plugin system,
I realize that my question was really a dummy quest
But could you please answer to these quick and simple questions:
* Is there a way to specifiy an ordering of plugins calls (it is
needed in
order to perform analyzis after language identification) ?
I do not clearly understand the question.
If you mean if you can call a set of extension that
One option is to follow the same pattern as the parse-* plugins.
Within ParserFactory's getParse() method, there is a conditional
statement that chooses which parse implementation to use based on the
contentType or suffix of the resource fetched.
You could do the same in NutchAnalyzer factory, ex
Dear Nutch Dev,
Ther's a way to penalize sites with many outlinks? I mean sorting hits
in results pages by outlinks and penalize in rank sites with many outlinks.
Any help?
Thanks,
Massimo
---
This SF.Net email is sponsored by: NEC IT G
One option is to follow the same pattern as the parse-* plugins.
Within ParserFactory's getParse() method, there is a conditional
statement that chooses which parse implementation to use based on the
contentType or suffix of the resource fetched.
You could do the same in NutchAnalyzer factory, exc
Sure. In IndexSegment's makeDocument() method, you can edit the code
to deboost the document by checking
parse.getData().getOutlinks().length and multiplying boost by some
factor . You would have to reindex for this change to take effect.
On 6/14/05, Massimo Miccoli <[EMAIL PROTECTED]> wrote:
>
[
http://issues.apache.org/jira/browse/NUTCH-21?page=comments#action_12313578 ]
Stephan Strittmatter commented on NUTCH-21:
---
Could someone send me a ppt-file which produces such errors for debugging?
I tested several files but I was not abel to rep
> There was a interesting sentence what is necessary to became a
> commiter on the old home page.
Excuse me Stefan, but what do you mean?
> Anyway no problem,
> best thing you can do is to read code and the old sourceforge mail
> archive.
> An article can be found here:
> http://www.media-style
There was a interesting sentence what is necessary to became a
commiter on the old home page.
Anyway no problem,
best thing you can do is to read code and the old sourceforge mail
archive.
An article can be found here:
http://www.media-style.com/index.jsp?folderPK=422&action=&;
At least it
Hi Jérôme
>From my eye, I think Nutch plugin architecture has some ideas the same
with eclipse. Am I right?
If yes, maybe you can refer to this article:
http://www.eclipse.org/articles/Article-Plug-in-architecture/plugin_architecture.html
/Jack
On 6/14/05, Jérôme Charron <[EMAIL PROTECTED]> wrot
Wouldn't it be simply the number of threads that you use to fetch the
pages ?
Doug Cutting wrote:
The latest code in SVN requires less RAM. If you still have problems,
try setting the config option io.map.index.skip to 8, and
indexer.termIndexInterval to 1024. These will both cause less RAM
> This is a typical use case for nutch plugins and go a step back to
> configuration file based dynamic class-loading makes from my point of
> view less sense.
> Now after most things was ported to plugins using
> AnalyzerMap.properties isn't in the style the rest of nutch is
> implemented.
Stefa
15 matches
Mail list logo