Hello, To crawl the document you can use Apache Tika before sending the content to Solr (via Solrj).
Regards, Marc. On Wed, Oct 5, 2011 at 1:16 AM, Chris Hostetter <hossman_luc...@fucit.org>wrote: > > : I want to index some document with solrj API's but the URL of theses > : documents is FTP, > : How to set username and password for FTP acount in solrj > : > : in solrj API there is CommonsHttpSolrServer method but i do not find any > : method for FTP configuration > > it sounds like you are getting ocnfused between using SolrJ to talk to > *solr* And using SolrJ to index arbitrary URLs. > > SolrJ doesn't do any crawling -- if you have data that you want to index > then your client code needs to decide what that data is (and where it > comes from) and feed that data to SolrJ as "documents" to index. the only > URLs that SolrJ knows about are: > * the URL for tlaking to Solr > * "strings" that SolrJ passes to solr as document fields that may just so > happen to be URLs (SolrJ doesn't know/care) > > -Hoss >