Re: Can Solr handle large text files?

karsten-solr Fri, 21 Oct 2011 02:29:23 -0700

Hi Peter,

highlighting in large text files can not be fast without dividing the original 
text in small piece.
So take a look in
http://xtf.cdlib.org/documentation/under-the-hood/#Chunking
and in
http://www.lucidimagination.com/blog/2010/09/16/2446/


Which means that you should divide your files and use
Result Grouping / Field Collapsing
to list only one hit per original document.

(xtf also would solve your problem "out of the box" but xtf does not use solr).

Best regards
  Karsten

-------- Original-Nachricht --------
> Datum: Thu, 20 Oct 2011 17:59:04 -0700
> Von: Peter Spam <ps...@mac.com>
> An: solr-user@lucene.apache.org
> Betreff: Can Solr handle large text files?

> I have about 20k text files, some very small, but some up to 300MB, and
> would like to do text searching with highlighting.
> 
> Imagine the text is the contents of your syslog.
> 
> I would like to type in some terms, such as "error" and "mail", and have
> Solr return the syslog lines with those terms PLUS two lines of context. 
> Pretty much just like Google's highlighting.
> 
> 1) Can Solr handle this?  I had extremely long query times when I tried
> this with Solr 1.4.1 (yes I was using TermVectors, etc.).  I tried breaking
> the files into 1MB pieces, but searching would be wonky => return the wrong
> number of documents (ie. if one file had a term 5 times, and that was the
> only file that had the term, I want 1 result, not 5 results).  
> 
> 2) What sort of tokenizer would be best?  Here's what I'm using:
> 
>    <field name="body" type="text_pl" indexed="true" stored="true"
> multiValued="false" termVectors="true" termPositions="true" 
> termOffsets="true" />
> 
>     <fieldType name="text_pl" class="solr.TextField">
>       <analyzer>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="0" generateNumberParts="0" catenateWords="0" 
> catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0"/>
>       </analyzer>
>     </fieldType>
> 
> 
> Thanks!
> Pete

Re: Can Solr handle large text files?

Reply via email to