Re: Anti-Pattern in lucent-join jar?

Darin Amos Fri, 05 Dec 2014 09:16:19 -0800

Thanks for the information!

The reason I ask is I am doing a POC on building a custom 
Query+QueryParser+Facet Component customization. I have had some issues finding 
exactly what I am looking for OOTB and I believe I need something custom. (its 
also a really good learning exercise)



I do ecommerce, when you type in a search in the website, we execute the search 
against available sku’s (i.e.. small red shirt, blue 34x34 jeans) then want to 
perform a rollup into those items products (shirt, jeans) and return the top 
level document. (We are on SOLR 4.3.0, and don’t have parent/child support yet)

I can’t use grouping because it throws off the pagination and is another can of 
worms, what I want can be easily done with the join query parser {![score]join 
from=blah to=blah} but there are two gaps:

1) My customer wants to return facets calculated by the child dataset, not the 
final parent dataset. Meaning if my search returns a couple shirts, I don’t 
want to show the “small” facet if none of the shirts have the small size in 
stock (meaning the small shirt wasn’t in the child docset). The product 
documents won’t even have the “size” field populated anyway.

2) We want to be able to add filters to the search string that will filter the 
child documents, not the final result set documents. Example:
q={!join from=parent to=id}name:(*Shirt*)&fq=size:small&fq=color:blue
        
        - * In this case… if the shirt doesn’t have small or blue in stock, it 
won’t be returned at all.


Hence, I have been working on a customization. I am looking to build my own 
custom join query (been calling it a rollup) and base it off of the scorejoin 
query implementation pointed out to me.

A sample query for small red shirts would look like the following:

q={!rollup from=parent to=id}name:(*Shirt*)&childfq=size:small&childfq=color:red

My code would do the following:
        1) Custom query parser (extends ExtendedDisMaxQParser) would wrap the 
main query in a BooleanQuery and add the “childfq” queries (Occur.MUST) similar 
to how the edismax boost queries work today.
        2) Custom query parser would wrap the final BooeanQuery in my custom 
Rollup query (would use the score mode for max child score in my case)
        3) Custom rollup query would record and save the child documents docset 
and make it available in an accessor method, therefore in a custom facet 
component I can execute:

        4) 
                Query q = rb.getQuery();
                if(q instance RollupQuery){
                        RollupQuery rq = (RollupQuery)q;
                        DocSet children = rq.getChildren();

                        //Build facets from the children docset.
                
                }
                else{
                        //build facets from rb.getResults().docSet….
                }


If anyone has taken the time to read this example, I greatly appreciate it 
would would appreciate any feedback. I would be glad to share the final 
implementation for review.

Thanks

Darin



> On Dec 5, 2014, at 4:52 AM, Mikhail Khludnev <mkhlud...@griddynamics.com> 
> wrote:
> 
> Thanks Roman! Let's expand it for the sake of completeness.
> Such issue is not possible in Solr, because caches are associated with the
> searcher. While you follow this design (see Solr userCache), and don't
> update what's cached once, there is no chance to shoot the foot.
> There were few caches inside of Lucene (old FieldCache,
> CachingWrapperFilter, ExternalFileField, etc), but they are properly mapped
> onto segment keys, hence it exclude such leakage across different
> searchers.
> 
> On Fri, Dec 5, 2014 at 6:43 AM, Roman Chyla <roman.ch...@gmail.com> wrote:
> 
>> +1, additionally (as it follows from your observation) the query can get
>> out of sync with the index, if eg it was saved for later use and ran
>> against newly opened searcher
>> 
>> Roman
>> On 4 Dec 2014 10:51, "Darin Amos" <dari...@gmail.com> wrote:
>> 
>>> Hello All,
>>> 
>>> I have been doing a lot of research in building some custom queries and I
>>> have been looking at the Lucene Join library as a reference. I noticed
>>> something that I believe could actually have a negative side effect.
>>> 
>>> Specifically I was looking at the JoinUtil.createJoinQuery(…) method and
>>> within that method you see the following code:
>>> 
>>>        TermsWithScoreCollector termsWithScoreCollector =
>>>            TermsWithScoreCollector.create(fromField,
>>> multipleValuesPerDocument, scoreMode);
>>>        fromSearcher.search(fromQuery, termsWithScoreCollector);
>>> 
>>> As you can see, when the JoinQuery is being built, the code is executing
>>> the query that is wraps with it’s own collector to collect all the
>> scores.
>>> If I were to write a query parser using this library (which someone has
>>> done here), doesn’t this reduce the benefit of the SOLR query cache? The
>>> wrapped query is being executing when the Join Query is being
>> constructed,
>>> not when it is executed.
>>> 
>>> Thanks
>>> 
>>> Darin
>>> 
>> 
> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
> 
> <http://www.griddynamics.com>
> <mkhlud...@griddynamics.com>

Re: Anti-Pattern in lucent-join jar?

Reply via email to