Be a little careful when looking at on-disk index sizes.
The *.fdt and *.fdx files are pretty irrelevant for the in-memory
requirements. They are just read to assemble the response (usually
10-20 docs). That said, you can _make_ them more relevant by
specifying very large document cache sizes.

Best,
Erick

On Fri, Jan 31, 2014 at 9:49 AM, Michael Della Bitta
<michael.della.bi...@appinions.com> wrote:
> Joesph:
>
> Not so much after using some of the settings available on Shawn's Solr Wiki
> page: https://wiki.apache.org/solr/ShawnHeisey
>
> This is what we're running with right now:
>
> -Xmx6g
> -XX:+UseConcMarkSweepGC
> -XX:CMSInitiatingOccupancyFraction=80
>
>
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062
>
> appinions inc.
>
> "The Science of Influence Marketing"
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions <https://twitter.com/Appinions> | g+:
> plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
> w: appinions.com <http://www.appinions.com/>
>
>
> On Fri, Jan 31, 2014 at 10:58 AM, Joseph Hagerty <joa...@gmail.com> wrote:
>
>> Thanks, Shawn. This information is actually not all that shocking to me.
>> It's always been in the back of my mind that I was "getting away with
>> something" in serving from the m1.large. Remarkably, however, it has served
>> me well for nearly two years; also, although the index has not always been
>> 30GB, it has always been much larger than the RAM on the box. As you
>> suggested, I can only suppose that usage patterns and the index schema have
>> in some way facilitated minimal heap usage, up to this point.
>>
>> For now, we're going to increase the heap size on the instance and see
>> where that gets us; if it still doesn't suffice for now, then we'll upgrade
>> to a more powerful instance.
>>
>> Michael, thanks for weighing in. Those i2 instances look delicious indeed.
>> Just curious -- have you struggled with garbage collection pausing at all?
>>
>>
>>
>> On Thu, Jan 30, 2014 at 7:43 PM, Shawn Heisey <s...@elyograg.org> wrote:
>>
>> > On 1/30/2014 3:20 PM, Joseph Hagerty wrote:
>> >
>> >> I'm using Solr 3.5 over Tomcat 6. My index has reached 30G.
>> >>
>> >
>> > <snip>
>> >
>> >
>> >  - The box is an m1.large on AWS EC2. 2 virtual CPUs, 4 ECU, 7.5 GiB RAM
>> >>
>> >
>> > One detail that you did not provide was how much of your 7.5GB RAM you
>> are
>> > allocating to the Java heap for Solr, but I actually don't think I need
>> > that information, because for your index size, you simply don't have
>> > enough. If you're sticking with Amazon, you'll want one of the instances
>> > with at least 30GB of RAM, and you might want to consider more memory
>> than
>> > that.
>> >
>> > An ideal RAM size for Solr is equal to the size of on-disk data plus the
>> > heap space used by Solr and other programs.  This means that if your java
>> > heap for Solr is 4GB and there are no other significant programs running
>> on
>> > the same server, you'd want a minimum of 34GB of RAM for an ideal setup
>> > with your index.  4GB of that would be for Solr itself, the remainder
>> would
>> > be for the operating system to fully cache your index in the OS disk
>> cache.
>> >
>> > Depending on your query patterns and how your schema is arranged, you
>> > *might* be able to get away as little as half of your index size just for
>> > the OS disk cache, but it's better to make it big enough for the whole
>> > index, plus room for growth.
>> >
>> > http://wiki.apache.org/solr/SolrPerformanceProblems
>> >
>> > Many people are *shocked* when they are told this information, but if you
>> > think about the relative speeds of getting a chunk of data from a hard
>> disk
>> > vs. getting the same information from memory, it's not all that shocking.
>> >
>> > Thanks,
>> > Shawn
>> >
>> >
>>
>>
>> --
>> - Joe
>>

Reply via email to