[
https://issues.apache.org/jira/browse/SOLR-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650071#action_12650071
]
Yonik Seeley commented on SOLR-475:
-----------------------------------
Some further results on a bigger index to show some practical limits.
This table (JIRA markup format) shows the performance and memory
characteristics of facet requests on a 50M document index, for different fields
and different numbers of documents being counted in the base query.
|| ||f10_100_t||f100_10_t||f1000_5_t||f10000_5_t||f100000_5_t||f100000_10_t
|field inversion time (sec)| 17.2| 17.9| 69.4| 87.8| 133.6| 388.0
|inverted field size (MB)| 68.1| 629.6| 416.9| 479.0| 589.9| 807.4
|1000 docs facet time (ms)| 7| 20| 13 |13 |16 |17
|10,000 docs| 55| 428 |22| 23 |29| 28
|100,000 docs| 54 |421| 35 |36 |46 |56
|1,000,000 docs| 55| 431 |149 |155| 249 |307
|10,000,000 docs| 54| 434| 625| 625 |1183| 1620
The "profile" of the faceted field is encoded in it's name. For example, the
field f1000_5_t has 1000 unique values across the whole index and between 0 and
5 values per document. It took 35 ms to facet on this field when the base
query matched 100,000 documents.
Test Hardware: Commodity PC
Processor: AMD Athlon 64 X2 5000+ (2.6GHz dual core)
Hard Drive: Western Digital Caviar GP WD5000AACS 500GB 5400 to 7200 RPM SATA
3.0Gb/s
Memory: 8GB DDR2 800 SDRAM (PC2 6400)
Operating System: Linux - Ubuntu 8.04 desktop, 64 bit version (x86_64)
Java VM: Sun Java6 (1.6.0_05) 64 bit hotspot (x86_64)
> multi-valued faceting via un-inverted field
> -------------------------------------------
>
> Key: SOLR-475
> URL: https://issues.apache.org/jira/browse/SOLR-475
> Project: Solr
> Issue Type: New Feature
> Reporter: Yonik Seeley
> Attachments: facet_performance.html, UnInvertedField.java,
> UnInvertedField.java
>
>
> Facet multi-valued fields via a counting method (like the FieldCache method)
> on an un-inverted representation of the field. For each doc, look at it's
> terms and increment a count for that term.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.