Does this do what you want?

http://wiki.apache.org/solr/StatsComponent

I can see that "group by" is a possible enhancement to this component.

Kjetil Ødegaard wrote:
Hi all,


we're currently using Solr 1.4.0 in a project for statistical data, where we
group and sum a number of "double" values. Probably not what most people use
Solr for, but it seems to be working fine for us :-)


We do have some challenges, especially with memory use, so I thought I'd
check here if anybody has done something similar.


Some details:


- The index is currently around 30 GB and growing. The data is indexed
directly from a database, each row ends up as a document. I think we have
around 100 million documents now, the largest core is about 40 million. The
data is split in different cores for different statistics data.


- Heap size is currently 4 GB. We're currently running all the cores in a
single JVM on WebSphere (WAS) 6.1. We have a couple of GB left for OS disk
cache. Initially we used a 1 GB heap, so we had to split cores in different
shards in order to avoid OutOfMemoryErrors because of the FieldCache (I
think).


- The grouping is done by a custom Solr component which takes parameters
that specify which fields to group by (like in SQL) and sums up values for
the group. This uses the FieldCache for speedy retrieval. We did a PoC on
using Documents instead, but this seemed to go a lot slower. I've done a
memory dump and the combined FieldCache looks to be about 3 GB (taken with a
grain of salt since I'm not sure all the data was cached).


I guess this is different from normal Solr searches since we have to process
all the documents in a core in order to calculate results, we can't just
return the first 10 (or whatever) documents.


Any tips or similar experiences?



---Kjetil

Reply via email to