When trying to run the example protocol-foo plugin (I am writing it), I was 
able to pass the injector and generator phases, but it seems the fetch phase 
fails.

From the log I have it seems the fetcher tries to resolve URLs before the 
PluginRepository is initialized. Such behaviour would of course render the 
whole protocol plugins useless...

So yes, the whole construct still needs to be tested carefully.

2017-09-23 08:13:06,783 INFO  fetcher.FetchItemQueues - Using queue mode : 
byHost
2017-09-23 08:13:06,785 INFO  fetcher.Fetcher - Fetcher: threads: 50
2017-09-23 08:13:06,785 INFO  fetcher.Fetcher - Fetcher: time-out divisor: 2
2017-09-23 08:13:06,836 INFO  plugin.PluginRepository - Plugins: looking in: 
/home/hiran/dev/nutch/runtime/local/plugins
2017-09-23 08:13:06,845 WARN  fetcher.FetchItem - Cannot parse url: 
foo://example.com
java.net.MalformedURLException: unknown protocol: foo
        at java.net.URL.<init>(URL.java:600)
        at java.net.URL.<init>(URL.java:490)
        at java.net.URL.<init>(URL.java:439)
        at org.apache.nutch.fetcher.FetchItem.create(FetchItem.java:71)
        at org.apache.nutch.fetcher.FetchItem.create(FetchItem.java:63)
        at 
org.apache.nutch.fetcher.FetchItemQueues.addFetchItem(FetchItemQueues.java:87)
        at org.apache.nutch.fetcher.QueueFeeder.run(QueueFeeder.java:91)
2017-09-23 08:13:06,899 INFO  fetcher.QueueFeeder - QueueFeeder finished: total 
2 records + hit by time limit :0
2017-09-23 08:13:07,508 INFO  plugin.PluginRepository - Plugin Auto-activation 
mode: [true]
2017-09-23 08:13:07,508 INFO  plugin.PluginRepository - Registered Plugins:
2017-09-23 08:13:07,508 INFO  plugin.PluginRepository -         Regex URL 
Filter (urlfilter-regex)
2017-09-23 08:13:07,508 INFO  plugin.PluginRepository -         Html Parse 
Plug-in (parse-html)
2017-09-23 08:13:07,508 INFO  plugin.PluginRepository -         HTTP Framework 
(lib-http)
2017-09-23 08:13:07,508 INFO  plugin.PluginRepository -         the nutch core 
extension points (nutch-extensionpoints)
2017-09-23 08:13:07,508 INFO  plugin.PluginRepository -         Basic Indexing 
Filter (index-basic)
2017-09-23 08:13:07,508 INFO  plugin.PluginRepository -         Anchor Indexing 
Filter (index-anchor)
2017-09-23 08:13:07,508 INFO  plugin.PluginRepository -         Tika Parser 
Plug-in (parse-tika)
2017-09-23 08:13:07,508 INFO  plugin.PluginRepository -         Basic URL 
Normalizer (urlnormalizer-basic)
2017-09-23 08:13:07,508 INFO  plugin.PluginRepository -         Regex URL 
Filter Framework (lib-regex-filter)
2017-09-23 08:13:07,508 INFO  plugin.PluginRepository -         Regex URL 
Normalizer (urlnormalizer-regex)
2017-09-23 08:13:07,508 INFO  plugin.PluginRepository -         CyberNeko HTML 
Parser (lib-nekohtml)
2017-09-23 08:13:07,508 INFO  plugin.PluginRepository -         OPIC Scoring 
Plug-in (scoring-opic)
2017-09-23 08:13:07,508 INFO  plugin.PluginRepository -         Pass-through 
URL Normalizer (urlnormalizer-pass)
2017-09-23 08:13:07,508 INFO  plugin.PluginRepository -         Http Protocol 
Plug-in (protocol-http)
2017-09-23 08:13:07,508 INFO  plugin.PluginRepository -         Foo Protocol 
Example Plug-in (protocol-foo)
2017-09-23 08:13:07,508 INFO  plugin.PluginRepository -         SolrIndexWriter 
(indexer-solr)
2

Reply via email to