Re: document support for file system crawling

Erik Hatcher Wed, 30 Aug 2006 03:03:34 -0700


On Aug 30, 2006, at 2:42 AM, Bruno wrote:

browsing through the message thread I tried to find a trailaddressing filesystem crawls. I want to implement an enterprise search over anetworkedfilesystem, crawling all sorts of documents, such as html, doc, pptand pdf.
Nutch provides plugins enabling it to read proprietary formats.
Is there support for the same functionality in solr?

No. Solr is strictly a search server that takes plain text for thefields of documents added to it. The client is responsible parsingthe text out of these types of documents. You could borrow thedocument parsing pieces from Lucene's contrib and Nutch and glue themtogether into your client that speaks to Solr, or perhaps Solr isn'tthe right approach for your needs? It certainly is possible to addthese capabilities into Solr, but it would be awkward to have tostream binary data into XML documents such that Solr could parse themon the server side.


        Erik

Re: document support for file system crawling

Reply via email to