Hi Hiran,

I recently needed the documents you requested myself, and the two below were 
the most helpful. Keep in mind that like most Nutch documentation, they are not 
totally up to date, so you need to be a bit flexible.
The most important difference for me was getting the source from GitHub rather 
than SVN.

https://wiki.apache.org/nutch/RunNutchInEclipse
https://florianhartl.com/nutch-plugin-tutorial.html



-----Original Message-----
From: Hiran CHAUDHURI [mailto:hiran.chaudh...@amadeus.com] 
Sent: 20 September 2017 09:50
To: user@nutch.apache.org
Subject: RE: [EXT] Re: Nutch Plugin Lifecycle broken due to lazy loading?

>> When you look at the protocol-smb hook it comes with this static 
>> hook, but as it is never executed does not help.
>
>Yes, it has to be called.

So when would Nutch call this static hook? In practice this does not happen 
before the plugin is required, but then it is too late as the 
MalformedURLException is thrown already.
And this aproach cannot cover the classpath issue.

>> - create a tutorial to add some arbitrary protocol (e.g. the  
>> foo://bar/baz url)
>> - modify the protocol-smb plugin to make use of the smbclient binary.
>>
>> I'd be willing to do the latter but would like to see a less clumsy 
>> behaviour for plugins.
>
>Great! Nutch could not exist without voluntary work. Thanks!
>
>Sorry, that integration will not be that easy. The problem was indeed already 
>known since long and should have been better tested, see also [1] and [2] - 
>the class >org.apache.nutch.protocol.sftp.Handler (a dummy handler) has been 
>lost, you'll find it in the zip file attached to NUTCH-714.
>
>However, encapsulation and lazy instantiation I would not call "clumsy 
>behavior", it's useful for heavy-weight plugins (e.g., parse-tika which brings 
>50 MB dependencies).

Both concepts, encapsulation and lazy instantiation are great. What I call 
clumsy is that the encapsulation does not work. Look at it from a user 
perspective of the protocol-smb plugin.
It comes as a (set of) jars, together with an XML descriptor. This could be 
nicely wrapped in a zip file and thus is one artifact that can easily be 
versioned and distributed.

But as soon as I want to install it, I have to
1 - put the artifact into the plugins directory
2 - modify Nutch configuration files to allow smb:// urls plus include the 
plugin to the loaded list
3 - extract jcifs.jar and place it on the system classpath
4 - run nutch with the correct system property

While items 1 and 2 can be understood easily and maybe one day come with a nice 
management interface, items 3 and 4 require knowledge about the internals of 
the plugin. Where did the encapsulation go? This is where I'd like to improve, 
and I have an idea how that could be established. Need to test it though.

>Thanks, looking forward how you get it solved, Sebastian

It seems I may need some support to go further. Maybe as you help me two 
documents could arise:
- Building nutch from source
- Developing a (protocol) plugin

I would need the first to test modifications to the plugin system.
Then with the second I would create a smb plugin that would suffer other 
limitations than the LGPL. ;-)

Hiran

Reply via email to