Using SolrJ with Tika

2009-09-02 Thread Angel Ice
Hi everybody. I hope it's the right place for questions, if not sorry. I'm trying to index rich documents (PDF, MS docs etc) in SolR/Lucene. I have seen a few examples explaining how to use tika to solve this. But most of these examples are using curl to send documents to Solr or an HTML POST wi

Re : Using SolrJ with Tika

2009-09-02 Thread Angel Ice
nux utility like Curl and the PDF/Word/RTF/PPT/XLS etc. will be indexed. We tested this last week. Tika has already been included in Solr 1.4. Cheers Rajan On Wed, Sep 2, 2009 at 5:26 PM, Angel Ice wrote: > Hi everybody. > > I hope it's the right place for questions, if not sor

Re : Using SolrJ with Tika

2009-09-03 Thread Angel Ice
TIKA (you can use AutoDetectParser) and then, SolrInputDocument doc = new SolrInputDocument(); doc.addField("DOC_CONTENT", CONTENT); solrServer.add(doc); soltServer.commit(); On Wed, Sep 2, 2009 at 5:26 PM, Angel Ice wrote: > Hi everybody. > > I hope it's the righ

wildcard searches

2009-10-05 Thread Angel Ice
Hi everyone, I have a little question regarding the search engine when a wildcard character is used in the query. Let's take the following example : - I have sent in indexation the word Hésitation (with an accent on the "e") - The filters applied to the field that will handle this word, result i

Re : wildcard searches

2009-10-06 Thread Angel Ice
query-time. If you want to enable wildcard queries, preserving the original token (while processing each token in your filter) might work. Cheers Avlesh On Mon, Oct 5, 2009 at 10:39 PM, Angel Ice wrote: > Hi everyone, > > I have a little question regarding the search engine when a wildcar

Re : Re : wildcard searches

2009-10-06 Thread Angel Ice
are not analyzed, the 'h' character in the query "hésita*" does NOT get removed during query time. This means that unless the original token was preserved in the field it wouldn't find any matches. This helps? Cheers Avlesh On Tue, Oct 6, 2009 at 2:02 PM, Angel Ice wr