On 2/27/2015 12:51 PM, Tang, Rebecca wrote:
> Thank you guys for all the suggestions and help! I'Ve identified the main
> culprit with debug=timing.  It was the mlt component.  After I removed it,
> the speed of the query went back to reasonable.  Another culprit is the
> expand component, but I can't remove it.  We've downgraded our amazon
> instance to 60G mem with general purpose SSD and the performance is pretty
> good.  It's only 70 cents/hr versus 2.80/hr for the 244G mem instance :)
>
> I also added all the suggested JMV parameters.  Now I have a gc.log that I
> dig into.
>
> One thing I would like to understand is how memory is managed by solr.
>
> If I do 'top -u solr', I see something like this:
>
> Mem:  62920240k total, 62582524k used,   337716k free,   133360k buffers
> Swap:        0k total,        0k used,        0k free, 54500892k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>     
>  4266 solr      20   0  192g 5.1g 854m S  0.0  8.4  37:09.97 java
>
> There are two things:
> 1) Mem: 62920240k total, 62582524k used. I think this is what the solr
> admin "physical memory" bar graph reports on.  Can I assume that most of
> the mem is used for loading part of the index?
>
> 2) And then there's the VIRT 192g and RES 5.1g.  What is the 5.1 RES
> (physical memory) that is used by solr?

The "total" and "used" values from top refer to *all* memory in the
entire machine, and it does match the "physical memory" graph in the
admin UI.  If you notice that the "cached" value is 54GB, that's where
most of the memory usage is actually happening.  This is the OS disk
cache -- the OS is automatically using extra memory to cache data on the
disk.  You are only caching about a third of your index, which may not
be enough for good performance, especially with complex queries.

The VIRT (virtual) and RES (resident) values are describing how Java is
using memory from the OS point of view.  The java process has allocated
5.1GB of RAM for the heap and all other memory structures.  The VIRT
number is the total amount of *address space* (virtual memory, not
actual memory) that the process has allocated.  For Solr, this will
typically be (approximately) the size of all your indexes plus the RES
and SHR values.

Solr (Lucene) uses the mmap functionality in the operating system for
all disk access by default (configurable) -- this means that it maps the
file on the disk into virtual memory.  This makes it so that a program
doesn't need to use disk I/O calls to access the data ... it just
pretends that the file is sitting in memory.  The operating system takes
care of translating those memory reads and writes into disk access.  All
memory that is not explicitly allocated to a program is automatically
used to cache that disk access -- this is the "cached" number from top
that I already mentioned.

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
http://en.wikipedia.org/wiki/Page_cache

Thanks,
Shawn

Reply via email to