(10/05/05 22:08), Serdar Sahin wrote:
Hi,

Currently, there are similar topics active in the mailing list, but it I did
not want to steal the topic.

I have currently indexed 100.000 documents, they are microsoft office/pdf
etc documents I convert them to TXT files before indexing. Files are between
1-500 pages. When I search something and filter it to retrieve documents
that has more than 100 pages, and activate highlighting, it takes 0.8-3
seconds, depending on the query. (10 result per page) If I retrieve
documents that has 1-5 pages, it drops to 0.1 seconds.

If I disable highlighting, it drops to 0.1-0.2 seconds, even on the large
documents, which is more than enough. This problem mostly happens where
there are no caches, on the first query. I use this configuration for
highlighting:


  $query->addHighlightField('description')->addHighlightField('plainText');
     $query->setHighlightSimplePre('<strong>');
     $query->setHighlightSimplePost('</strong>');
     $query->setHighlightHighlightMultiTerm(TRUE);
     $query->setHighlightMaxAnalyzedChars(10000);
     $query->setHighlightSnippets(2);

Do you have any suggestions to improve response time while highlighting is
active? I have read couple of articles you have previously provided but they
did not help.

And for the second question, I retrieve these fields:

     $query->addField('title')->addField('cat')->addField('thumbs_up')->
             addField('thumbs_down')->addField('lang')->addField('id')->

  addField('username')->addField('view_count')->addField('pages')->
             addField('no_img')->addField('date');

If I can't solve the highlighting problem on large documents, I can simply
disable it and retrieve first x characters from the plainText (full text)
field, but is it possible to retrieve first x characters without using the
highlighting feature? When I use this;
     $query->setHighlight(TRUE);
     $query->setHighlightAlternateField('plainText');
     $query->setHighlightMaxAnalyzedChars(0);
     $query->setHighlightMaxAlternateFieldLength(256);

It still takes 2 seconds if I retrieve 10 rows that has 200-300 pages. The
highlighting still works so it might be the source of the problem, I want to
completely disable it and retrieve only the first 256 characters of the
plainText field. Is it possible? It may remove some overhead give better
performance.

I personally prefer the highlighting solution but I also would like to hear
the solution for this problem. For the same query, if I disable highlighting
and without retrieving (but still searching) the plainText field, it drops
to 0.0094 seconds. So I think if I can get the first 256 characters without
using the highlighting, I will get better performance.

Any suggestions regarding with these two problems will highly appreciated.

Thanks,

Serdar Sahin

Hi Serdar,

There are a few things I think of you can try.

1. Provide another field for highlighting and use copyField
to copy plainText to the highlighting field. When using copyField,
specify maxChars attribute to limit the length of the copy of plainText.
This should work on Solr 1.4.

2. If you can use branch_3x version of Solr, try FastVectorHighlighter.

Koji

--
http://www.rondhuit.com/en/

Reply via email to