Re: 答复: (Issue) How improve solr facet performance

david.w.smi...@gmail.com Tue, 27 May 2014 06:29:34 -0700

Alice,

RE grouping, try Solr 4.8’s new “collapse” qparser w/ “expand"
SearchComponent.  The ref guide has the docs.  It’s usually a faster
equivalent approach to group=true


Do you care to comment further on NewEgg’s apparent switch from Endeca to
Solr?  (confirm true/false and rationale)

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, May 27, 2014 at 4:17 AM, Alice.H.Yang (mis.cnsh04.Newegg) 41493 <
alice.h.y...@newegg.com> wrote:

> Hi, Token
>
> 1.
>         I set the 3 fields with hundreds of values uses fc and the rest
> uses enum, the performance is improved 2 times compared with no parameter,
> and then I add facet.method=20 , the performance is improved about 4 times
> compared with no parameter.
>         And I also tried setting 9 facet field to one copyfield, I test
> the performance, it is improved about 2.5 times compared with no parameter.
>         So, It is improved a lot under your advice, thanks a lot.
> 2.
>         Now I have another performance issue, It's the group performance.
> The number of data is as same as facet performance scenario.
> When the keyword search hits about one million documents, the QTime is
> about 600ms.(It doesn't query the first time, it's in cache)
>
> Query url:
>
> select?fl=item_catalog&q=default_search:paramter&defType=edismax&rows=50&group=true&group.field=item_group_id&group.ngroups=true&group.sort=stock4sort%20desc,final_price%20asc,is_selleritem%20asc&sort=score%20desc,default_sort%20desc
>
> It need Qtime about 600ms.
>
> This query have two parameter:
>                                                 1. fl one field
>                                                 2. group=true,
> group.ngroups=true
>
> If I set group=false,, the QTime is only 1 ms.
> But I need do group and group.ngroups, How can I improve the group
> performance under this demand. Do you have some advice for me. I'm looking
> forward to your reply.
>
> Best Regards,
> Alice Yang
> +86-021-51530666*41493
> Floor 19,KaiKai Plaza,888,Wanhandu Rd,Shanghai(200042)
>
>
> -----邮件原件-----
> 发件人: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
> 发送时间: 2014年5月24日 15:17
> 收件人: solr-user@lucene.apache.org
> 主题: RE: (Issue) How improve solr facet performance
>
> Alice.H.Yang (mis.cnsh04.Newegg) 41493 [alice.h.y...@newegg.com] wrote:
> > 1.  I'm sorry, I have made a mistake, the total number of documents is
> 32 Million, not 320 Million.
> > 2.  The system memory is large for solr index, OS total has 256G, I set
> the solr tomcat HEAPSIZE="-Xms25G -Xmx100G"
>
> 100G is a very high number. What special requirements dictates such a
> large heap size?
>
> > Reply:  9 fields I facet on.
>
> Solr treats each facet separately and with facet.method=fc and 10M hits,
> this means that it will iterate 9*10M = 90M document IDs and update the
> counters for those.
>
> > Reply:  3 facet fields have one hundred unique values, other 6 facet
> fields' unique values are between 3 to 15.
>
> So very low cardinality. This is confirmed by your low response time of
> 6ms for 2925 hits.
>
> > And we test this scenario:  If the number of facet fields' unique values
> is less we add facet.method=enum, there is a little to improve performance.
>
> That is a shame: enum is normally the simple answer to a setup like yours.
> Have you tried fine-tuning your fc/enum selection, so that the 3 fields
> with hundreds of values uses fc and the rest uses enum? That might halve
> your response time.
>
>
> Since the number of unique facets is so low, I do not think that DocValues
> can help you here. Besides the fine-grained fc/enum-selection above, you
> could try collapsing all 9 facet-fields into a single field. The idea
> behind this is that for facet.method=fc, performing faceting on a field
> with (for example) 300 unique values takes practically the same amount of
> time as faceting on a field with 1000 unique values: Faceting on a single
> slightly larger field is much faster than faceting on 9 smaller fields.
> After faceting with facet.limit=-1 on the single super-facet-field, you
> must match the returned values back to their original fields:
>
>
> If you have the facet-fields
>
> field0: 34
> field1: 187
> field2: 78432
> field3: 3
> ...
>
> then collapse them by or-ing a field-specific mask that is bigger than the
> max in any field, then put it all into a single field:
>
> fieldAll: 0xA0000000 | 34
> fieldAll: 0xA1000000 | 187
> fieldAll: 0xA2000000 | 78432
> fieldAll: 0xA3000000 | 3
> ...
>
> perform the facet request on fieldAll with facet.limit=-1 and split the
> resulting counts with
>
> for (entry: facetResultAll) {
>   switch (0xFF000000 & entry.value) {
>     case 0xA0000000:
>       field0.add(entry.value, entry.count);
>       break;
>     case 0xA1000000:
>       field1.add(entry.value, entry.count);
>       break;
> ...
>   }
> }
>
>
> Regards,
> Toke Eskildsen, State and University Library, Denmark
>

Re: 答复: (Issue) How improve solr facet performance

Reply via email to