[ https://issues.apache.org/jira/browse/SOLR-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yonik Seeley updated SOLR-475: ------------------------------ Attachment: UnInvertedField.java Prototype attached. This is completely untested code, and is still missing the solr interface + caching. The approach is described in the comments (cut-n-pasted here). Any thoughts or comments on the approach? I may not have time to immediately work on this (fix the bugs, add tests, hook up to solr, add caching of un-inverted field, etc), so additional contributions in this direction are welcome! {code} /** * Final form of the un-inverted field: * Each document points to a list of term numbers that are contained in that document. * * Term numbers are in sorted order, and are encoded as variable-length deltas from the * previous term number. Real term numbers start at 2 since 0 and 1 are reserved. A * term number of 0 signals the end of the termNumber list. * * There is a singe int[maxDoc()] which either contains a pointer into a byte[] for * the termNumber lists, or directly contains the termNumber list if it fits in the 4 * bytes of an integer. If the first byte in the integer is 1, the next 3 bytes * are a pointer into a byte[] where the termNumber list starts. * * There are actually 256 byte arrays, to compensate for the fact that the pointers * into the byte arrays are only 3 bytes long. The correct byte array for a document * is a function of it's id. * * To save space and speed up faceting, any term that matches enough documents will * not be un-inverted... it will be skipped while building the un-inverted field structore, * and will use a set intersection method during faceting. * * To further save memory, the terms (the actual string values) are not all stored in * memory, but a TermIndex is used to convert term numbers to term values only * for the terms needed after faceting has completed. Only every 128th term value * is stored, along with it's corresponding term number, and this is used as an * index to find the closest term and iterate until the desired number is hit (very * much like Lucene's own internal term index). */ {code} > multi-valued faceting via un-inverted field > ------------------------------------------- > > Key: SOLR-475 > URL: https://issues.apache.org/jira/browse/SOLR-475 > Project: Solr > Issue Type: New Feature > Reporter: Yonik Seeley > Attachments: UnInvertedField.java > > > Facet multi-valued fields via a counting method (like the FieldCache method) > on an un-inverted representation of the field. For each doc, look at it's > terms and increment a count for that term. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.