Hi, > It would be something like a directory listing, or no directory listing but > content.
Have a look at the protocol-file plugin: it wraps a directory listing into a HTML page similar as the Apache web server does if there is no index.html in a directory. > It is possible that a protocol-plugin cannot do much without a parser-plugin? No. Nutch is a web crawler and crawling file systems or file servers is only done by "emulating" web pages. > And if I were to implement such a parser-plugin There is already parse-html and parse-tika... In general, if it's only about indexing a file system, it may be easier to send the documents directly to Solr (or another indexer). But often you have a mix of content providers (file system/server, web site, wiki, etc.) and usually many of them already provide a web frontend to browse (or crawl) the content. Best, Sebastian On 09/27/2017 06:57 AM, Hiran CHAUDHURI wrote: > Hi there. > > > > While I am trying to create the protocol-foo, an implementation for the > example protocol with URLs > like foo://something I see difficulty in distinguishing when to tell nutch to > search for more URLs > and when not to. It would be something like a directory listing, or no > directory listing but content. > > > > It is possible that a protocol-plugin cannot do much without a parser-plugin? > And if I were to > implement such a parser-plugin, would I then have to implement the directory > listing plus all the > content parsing like Tika? > > > > Hiran > > > > > > *Hiran Chaudhuri** > Principal Support Engineer* > > Service Reliability Engineering - Custom > > Amadeus Data Processing GmbH > Berghamer Strasse 6 > 85435 Erding > T: +49-8122-43x3662 > hiran.chaudhuri@amadeus.com_ > http://amadeus.com <http://amadeus.com/>_** > > >