Hi,
I'm indexing pdf documents to use full text search with solr.
To get the number of the page where the result was found, I save every
page separately and group the results with a field called doc_id.
(See this topic:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201303.mbox/%3c1362242815.4092.140661199082425.338ed...@webmail.messagingengine.com%3E
)
This works fine if I search in a single document, but if I search over
the whole database for a term, the results are really really slow,
especially if group.limit is above 10. I indexed about 150.000 pages
for now, but in the end it will be more than 1.000.000 pages.
How can I improve search performance?
I'm using this configuration:
<requestHandler name="/search/fulltext" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="wt">json</str>
<str name="indent">true</str>
<str name="df">text</str>
<!-- Query settings -->
<str name="defType">edismax</str>
<str name="qf">
id^10.0 ean^10.0
title^10.0 subtitle^10.0 original_title^5.0
content^3.0
content_en^3.0
content_fr^3.0
content_de^3.0
content_it^3.0
content_es^3.0
keyword^5.0 text^0.5
author^2.0 editor^1.0
publisher^3.0 category^1.0 series^5.0 information^1.0
</str>
<str name="mm">100%</str>
<str name="q.alt">*:*</str>
<str name="rows">10</str>
<str name="fl">id, title, subtitle, original_title, author,
editor, publisher, category, series, score</str>
<bool name="group">true</bool>
<str name="group.field">doc_id</str>
<int name="group.limit">20</int>
<bool name="hl">true</bool>
<str name="hl.fl">content_*</str>
<str name="hl.alternateField">content</str>
<bool name="hl.requireFieldMatch">true</bool>
<str name="hl.simple.pre"><![CDATA[<strong>]]></str>
<str name="hl.simple.post"><![CDATA[</strong>]]></str>
</lst>
</requestHandler>
THX for your help.
- Gesh