
To crawl the document you can use Apache Tika before sending the content to
Solr (via Solrj).


On Wed, Oct 5, 2011 at 1:16 AM, Chris Hostetter <hossman_luc...@fucit.org>wrote:

> : I want to index some document with solrj API's but the URL of theses
> : documents is FTP,
> :  How to set username and password for FTP acount in solrj
> :
> : in solrj API there is CommonsHttpSolrServer method but i do not find any
> : method for FTP configuration
> it sounds like you are getting ocnfused between using SolrJ to talk to
> *solr* And using SolrJ to index arbitrary URLs.
> SolrJ doesn't do any crawling -- if you have data that you want to index
> then your client code needs to decide what that data is (and where it
> comes from) and feed that data to SolrJ as "documents" to index.  the only
> URLs that SolrJ knows about are:
>  * the URL for tlaking to Solr
>  * "strings" that SolrJ passes to solr as document fields that may just so
>   happen to be URLs (SolrJ doesn't know/care)
> -Hoss

Reply via email to