Hi Peter, highlighting in large text files can not be fast without dividing the original text in small piece. So take a look in http://xtf.cdlib.org/documentation/under-the-hood/#Chunking and in http://www.lucidimagination.com/blog/2010/09/16/2446/
Which means that you should divide your files and use Result Grouping / Field Collapsing to list only one hit per original document. (xtf also would solve your problem "out of the box" but xtf does not use solr). Best regards Karsten -------- Original-Nachricht -------- > Datum: Thu, 20 Oct 2011 17:59:04 -0700 > Von: Peter Spam <ps...@mac.com> > An: solr-user@lucene.apache.org > Betreff: Can Solr handle large text files? > I have about 20k text files, some very small, but some up to 300MB, and > would like to do text searching with highlighting. > > Imagine the text is the contents of your syslog. > > I would like to type in some terms, such as "error" and "mail", and have > Solr return the syslog lines with those terms PLUS two lines of context. > Pretty much just like Google's highlighting. > > 1) Can Solr handle this? I had extremely long query times when I tried > this with Solr 1.4.1 (yes I was using TermVectors, etc.). I tried breaking > the files into 1MB pieces, but searching would be wonky => return the wrong > number of documents (ie. if one file had a term 5 times, and that was the > only file that had the term, I want 1 result, not 5 results). > > 2) What sort of tokenizer would be best? Here's what I'm using: > > <field name="body" type="text_pl" indexed="true" stored="true" > multiValued="false" termVectors="true" termPositions="true" > termOffsets="true" /> > > <fieldType name="text_pl" class="solr.TextField"> > <analyzer> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="0" generateNumberParts="0" catenateWords="0" > catenateNumbers="0" > catenateAll="0" splitOnCaseChange="0"/> > </analyzer> > </fieldType> > > > Thanks! > Pete