Re: stopwords file configuration

2010-11-16 Thread alendo
I reply to myself because I founded the mistake. The italian stopwords file that I founded on apache site contains on the same line of each stopword a comment shell style, the stopwords tokenizer probably is basical and doesn't accept comments on the same line of stopwords. I dropped them and

stopwords file configuration

2010-11-16 Thread alendo
I'm using Lucid Imagination installation kit for SOLR (the last one with SOLR 1.4). I would like to use stopwords, and I installed in LucidWorks/lucidworks/solr/conf/stopwords.txt the italian version of the file. Moreover the field where I want to clean stopwords is declared in schema.xml as

Re: Posting pdf file and posting from remote

2010-02-11 Thread alendo
remember if it's in the solr 1.4 release.) With this you can save the pdf binary in one field and save the extracted text in another field. I'm doing this now with html. On Tue, Feb 9, 2010 at 2:08 AM, alendo alessandra.donn...@uniroma2.it wrote: Ok I'm going ahead (may be:). I tried

Posting pdf file and posting from remote

2010-02-09 Thread alendo
I understand that tika is able to index pdf content: its true? I tried to post a pdf from local and I've seen in the solr/admin schema browser another document, but when I search only the document id is available, the documents doesn't seem indexed. Do I need other products to index pdf content?

Re: Posting pdf file and posting from remote

2010-02-09 Thread alendo
Ok I'm going ahead (may be:). I tried another curl command to send the file from remote: http://mysolr:/solr/update/extract?literal.id=8514stream.file=files/attach-8514.pdfstream.contentType=application/pdf and the behaviour has been changed: now I get an error in solr log file: HTTP