Hi,

I'm indexing pdf documents to use full text search with solr.
To get the number of the page where the result was found, I save every page separately and group the results with a field called doc_id. (See this topic: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201303.mbox/%3c1362242815.4092.140661199082425.338ed...@webmail.messagingengine.com%3E )

This works fine if I search in a single document, but if I search over the whole database for a term, the results are really really slow, especially if group.limit is above 10. I indexed about 150.000 pages for now, but in the end it will be more than 1.000.000 pages.

How can I improve search performance?

I'm using this configuration:

<requestHandler name="/search/fulltext" class="solr.SearchHandler">
     <lst name="defaults">
           <str name="echoParams">explicit</str>

           <str name="wt">json</str>
           <str name="indent">true</str>
           <str name="df">text</str>

           <!-- Query settings -->
           <str name="defType">edismax</str>
           <str name="qf">
                  id^10.0 ean^10.0
                  title^10.0 subtitle^10.0 original_title^5.0
                  content^3.0
                  content_en^3.0
                  content_fr^3.0
                  content_de^3.0
                  content_it^3.0
                  content_es^3.0
                  keyword^5.0 text^0.5
                  author^2.0 editor^1.0
                  publisher^3.0 category^1.0 series^5.0 information^1.0
           </str>
           <str name="mm">100%</str>
           <str name="q.alt">*:*</str>
           <str name="rows">10</str>
<str name="fl">id, title, subtitle, original_title, author, editor, publisher, category, series, score</str>
           <bool name="group">true</bool>
           <str name="group.field">doc_id</str>
           <int name="group.limit">20</int>
           <bool name="hl">true</bool>
           <str name="hl.fl">content_*</str>
           <str name="hl.alternateField">content</str>
           <bool name="hl.requireFieldMatch">true</bool>
           <str name="hl.simple.pre"><![CDATA[<strong>]]></str>
           <str name="hl.simple.post"><![CDATA[</strong>]]></str>
     </lst>
</requestHandler>

THX for your help.

- Gesh

Reply via email to