Hi everyone,

I'm trying to write a PostFilter for Solr 5.1.0, which is meant to crawl 
through grandchild documents during a search through the parents and filter out 
documents based on statistics gathered from aggregating the grandchildren 
together.  I've been successful in getting the logic correct, but it does not 
perform so well - I'm grabbing too many documents from the index along the way. 
 I'm trying to filter out grandchild documents which are not relevant to the 
statistics I'm collecting, in order to reduce the number of document objects 
pulled from the IndexReader.

I've implemented the following code in my DelegatingCollector.collect:

if (inStockSkusBitSet == null) {
SolrIndexSearcher SidxS = (SolrIndexSearcher) idxS; // type cast from 
IndexSearcher to expose getDocSet.
inStockSkusDocSet = SidxS.getDocSet(inStockSkusQuery);
inStockSkusBitDocSet = (BitDocSet) inStockSkusDocSet; // type cast from DocSet 
to expose getBits.
inStockSkusBitSet = inStockSkusBitDocSet.getBits();
}


My BitDocSet reports a size which matches a standard query for the more limited 
set of grandchildren, and the FixedBitSet (inStockSkusBitSet) also reports this 
same cardinality.  Based on that fact, it seems that the getDocSet call itself 
must be working properly, and returning the right number of documents.  
However, when I try to filter out grandchild documents using either 
BitDocSet.exists or BitSet.get (passing over any grandchild document which 
doesn't exist in the bitdocset or return true from the bitset), I get about 1/3 
less results than I'm supposed to.   It seems many documents that should match 
the filter, are being excluded, and documents which should not match the 
filter, are being included.

I'm trying to use it either of these ways:

if (!inStockSkusBitSet.get(currentChildDocNumber)) continue;
if (!inStockSkusBitDocSet.exists(currentChildDocNumber)) continue;

The currentChildDocNumber is simply the docNumber which is passed to 
DelegatingCollector.collect, decremented until I hit a document that doesn't 
belong to the parent document.

I can't seem to figure out a way to actually use the BitDocSet (or its 
derivatives) to quickly eliminate document IDs.  It seems like this is how it's 
supposed to be used.  What am I getting wrong?

Sorry if this is a newbie question, I've never written a PostFilter before, and 
frankly, the documentation out there is a little sketchy (mostly for version 4) 
- so many classes have changed names and so many of the more well-documented 
techniques are deprecated or removed now, it's tough to follow what the current 
best practice actually is.  I'm using the block join functionality heavily so 
I'm trying to keep more current than that.  I would be happy to send along the 
full source privately if it would help figure this out, and plan to write up 
some more elaborate instructions (updated for Solr 5) for the next person who 
decides to write a PostFilter and work with block joins, if I ever manage to 
get this performing well enough.

Thanks for any pointers!  Totally open to doing this an entirely different way. 
 I read DocValues might be a more elegant approach but currently that would 
require reindexing, so trying to avoid that.

Also, I've been wondering if the query above would read from the filter cache 
or not.  The query is constructed like this:


    private Term inStockTrueTerm = new Term("sku_history.is_in_stock", "T");
    private Term objectTypeSkuHistoryTerm = new Term("object_type", 
"sku_history");
...

inStockTrueTermQuery = new TermQuery(inStockTrueTerm);
objectTypeSkuHistoryTermQuery = new TermQuery(objectTypeSkuHistoryTerm);
inStockSkusQuery = new BooleanQuery();
inStockSkusQuery.add(inStockTrueTermQuery, BooleanClause.Occur.MUST);
inStockSkusQuery.add(objectTypeSkuHistoryTermQuery, BooleanClause.Occur.MUST);
--
Steve

________________________________

WGSN is a global foresight business. Our experts provide deep insight and 
analysis of consumer, fashion and design trends. We inspire our clients to plan 
and trade their range with unparalleled confidence and accuracy. Together, we 
Create Tomorrow.

WGSN<http://www.wgsn.com/> is part of WGSN Limited, comprising of 
market-leading products including WGSN.com<http://www.wgsn.com>, WGSN Lifestyle 
& Interiors<http://www.wgsn.com/en/lifestyle-interiors>, WGSN 
INstock<http://www.wgsninstock.com/>, WGSN 
StyleTrial<http://www.wgsn.com/en/styletrial/> and WGSN 
Mindset<http://www.wgsn.com/en/services/consultancy/>, our bespoke consultancy 
services.

The information in or attached to this email is confidential and may be legally 
privileged. If you are not the intended recipient of this message, any use, 
disclosure, copying, distribution or any action taken in reliance on it is 
prohibited and may be unlawful. If you have received this message in error, 
please notify the sender immediately by return email and delete this message 
and any copies from your computer and network. WGSN does not warrant that this 
email and any attachments are free from viruses and accepts no liability for 
any loss resulting from infected email transmissions.

WGSN reserves the right to monitor all email through its networks. Any views 
expressed may be those of the originator and not necessarily of WGSN. WGSN is 
powered by Top Right Group<http://www.topright-group.com>, which transforms 
knowledge businesses to deliver exceptional performance.

Please be advised all phone calls may be recorded for training and quality 
purposes and by accepting and/or making calls from and/or to us you acknowledge 
and agree to calls being recorded.

WGSN Limited, Company number 4858491

registered address:

Top Right Group Limited, The Prow, 1 Wilder Walk, London W1B 5AP

WGSN Inc., tax ID 04-3851246, registered office c/o National Registered Agents, 
Inc., 160 Greentree Drive, Suite 101, Dover DE 19904, United States

4C Serviços de Informação Ltda., CNPJ/MF (Taxpayer's Register): 
15.536.968/0001-04, Address: Avenida Nove de Julho, 5966, Loja, CEP 01406-200, 
Jardim Europa, São Paulo

4C Business Information Consulting (Shanghai) Co., Ltd, 富新商务信息咨询(上海)有限公司, 
registered address Unit 4810/4811, 48/F Tower 1, Grand Gateway, 1 Hong Qiao 
Road, Xuhui District, Shanghai

Reply via email to