Hi Hiran,

If you haven't already please take a look at 
https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral and see if any 
of your questions are answered. If we need to augment the documentation then we 
can do that. Please let us know if this is the case.

lewismc

On 2024/10/06 20:13:35 Hiran Chaudhuri wrote:
> I was experimenting with the protocol plugin that continually connects
> and disconnects from the server for each and every request.
> HTML may be lightweight (or cached in the httpclient code), but other
> protocols are not.
> 
> My code was ruthless about establishing and tearing down the
> connections, but it looked very repetitive for getProtocolOutput and
> getRobotRules.
> Trying to make functions reusable first of all led to loss of complete
> control on the connection. No worries, they get garbage collected -
> don't they?
> 
> Well it seems these connections get closed and gc'ed but it takes too
> much time. Inbetween the fetcher hits problems and runs into grace
> periods of 300 000 milliseconds. The total scan becomes unperformant
> just because I tried to optimize the code. Which leads me to the next
> question:
> 
> What is the plugin's life cycle? Is there one plugin instance per
> server? One per URL? One per thread? Or one in total?
> This scope defines whether I can make use of local variables, or
> instance fields. Or is there some other mechanism where a plugin could
> store data that should survive across the getProtocolOutput calls? Could
> a plugin define which scope it wants to be in?
> 
> 

Reply via email to