My read on docvalues is that "typical" use is to keep them in memory - at least when they are used, and if you are creating them, it makes sense to assume you are going to be using them?

-Mike

On 11/29/14 1:25 PM, Alexandre Rafalovitch wrote:
There are also docValues files as well, right? And they have different
memory requirements depending on how they are setup. (not 100% sure
what I am trying to say here, though)

Regards,
    Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 29 November 2014 at 13:16, Michael Sokolov
<msoko...@safaribooksonline.com> wrote:
Of course testing is best, but you can also get an idea of the size of the
non-storage part of your index by looking in the solr index folder and
subtracting the size of the files containing the stored fields from the
total size of the index.  This depends of course on the internal storage
strategy of Lucene and may change from release to release, but it is
documented. The .fdt and .fdx files are the stored field files (currently,
at least, and if you don't have everything in a compound file).  If you are
indexing term vectors (.tvd and .tvf files) as well, I think these may also
be able to be excluded from the index size also when calculating the
required memory, at least based on typical usage patterns for term vectors
(ie highlighting).

I wonder if there's any value in providing this metric (total index size -
stored field size - term vector size) as part of the admin panel?  Is it
meaningful?  It seems like there would be a lot of cases where it could give
a good rule of thumb for memory sizing, and it would save having to root
around in the index folder.

-Mike


On 11/29/14 12:16 PM, Erick Erickson wrote:
bq: You should have memory to fit your whole database in disk cache and
then
some more.

I have to disagree here if for no other reason than stored data, which
is irrelevant
for searching, may make up virtually none or virtually all of your
on-disk space.
Saying it all needs to fit in disk cache is too broad-brush a
statement, gotta test.

In this case, though, I _do_ think that there's not enough memory here,
Toke's
comments are spot on.

On Sat, Nov 29, 2014 at 2:02 AM, Toke Eskildsen <t...@statsbiblioteket.dk>
wrote:
Po-Yu Chuang [ratbert.chu...@gmail.com] wrote:
[...] Everything works fine now, but I noticed that the load
average of the server is high because there is constantly
heavy disk read access. Please point me some directions.
RAM: 18G
Solr home: 185G
disk read access constantly 40-60M/s
Solr search performance is tightly coupled to the speed of small random
reads. There are two obvious ways of ensuring that in these days:

1) Add more RAM to the server, so that the disk cache can hold a larger
part of the index. If you add enough RAM (depends on your index, but 50-100%
of the index size is a rule of thumb), you get "ideal" storage speed, by
which I mean that the bottleneck moves away from storage. If you are using
spinning drives, the 18GB of RAM is not a lot for a 185GB index.

2) Use SSDs instead of spinning drives (if you do not already do so). The
speed-up depends a lot on what you are doing, but is is a cheap upgrade and
it can later be coupled with extra RAM if it is not enough in itself.

The Solr Wiki has this:
https://wiki.apache.org/solr/SolrPerformanceProblems
And I have this:
http://sbdevel.wordpress.com/2013/06/06/memory-is-overrated/

- Toke Eskildsen


Reply via email to