Be a little careful when looking at on-disk index sizes. The *.fdt and *.fdx files are pretty irrelevant for the in-memory requirements. They are just read to assemble the response (usually 10-20 docs). That said, you can _make_ them more relevant by specifying very large document cache sizes.
Best, Erick On Fri, Jan 31, 2014 at 9:49 AM, Michael Della Bitta <michael.della.bi...@appinions.com> wrote: > Joesph: > > Not so much after using some of the settings available on Shawn's Solr Wiki > page: https://wiki.apache.org/solr/ShawnHeisey > > This is what we're running with right now: > > -Xmx6g > -XX:+UseConcMarkSweepGC > -XX:CMSInitiatingOccupancyFraction=80 > > > > Michael Della Bitta > > Applications Developer > > o: +1 646 532 3062 > > appinions inc. > > "The Science of Influence Marketing" > > 18 East 41st Street > > New York, NY 10017 > > t: @appinions <https://twitter.com/Appinions> | g+: > plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts> > w: appinions.com <http://www.appinions.com/> > > > On Fri, Jan 31, 2014 at 10:58 AM, Joseph Hagerty <joa...@gmail.com> wrote: > >> Thanks, Shawn. This information is actually not all that shocking to me. >> It's always been in the back of my mind that I was "getting away with >> something" in serving from the m1.large. Remarkably, however, it has served >> me well for nearly two years; also, although the index has not always been >> 30GB, it has always been much larger than the RAM on the box. As you >> suggested, I can only suppose that usage patterns and the index schema have >> in some way facilitated minimal heap usage, up to this point. >> >> For now, we're going to increase the heap size on the instance and see >> where that gets us; if it still doesn't suffice for now, then we'll upgrade >> to a more powerful instance. >> >> Michael, thanks for weighing in. Those i2 instances look delicious indeed. >> Just curious -- have you struggled with garbage collection pausing at all? >> >> >> >> On Thu, Jan 30, 2014 at 7:43 PM, Shawn Heisey <s...@elyograg.org> wrote: >> >> > On 1/30/2014 3:20 PM, Joseph Hagerty wrote: >> > >> >> I'm using Solr 3.5 over Tomcat 6. My index has reached 30G. >> >> >> > >> > <snip> >> > >> > >> > - The box is an m1.large on AWS EC2. 2 virtual CPUs, 4 ECU, 7.5 GiB RAM >> >> >> > >> > One detail that you did not provide was how much of your 7.5GB RAM you >> are >> > allocating to the Java heap for Solr, but I actually don't think I need >> > that information, because for your index size, you simply don't have >> > enough. If you're sticking with Amazon, you'll want one of the instances >> > with at least 30GB of RAM, and you might want to consider more memory >> than >> > that. >> > >> > An ideal RAM size for Solr is equal to the size of on-disk data plus the >> > heap space used by Solr and other programs. This means that if your java >> > heap for Solr is 4GB and there are no other significant programs running >> on >> > the same server, you'd want a minimum of 34GB of RAM for an ideal setup >> > with your index. 4GB of that would be for Solr itself, the remainder >> would >> > be for the operating system to fully cache your index in the OS disk >> cache. >> > >> > Depending on your query patterns and how your schema is arranged, you >> > *might* be able to get away as little as half of your index size just for >> > the OS disk cache, but it's better to make it big enough for the whole >> > index, plus room for growth. >> > >> > http://wiki.apache.org/solr/SolrPerformanceProblems >> > >> > Many people are *shocked* when they are told this information, but if you >> > think about the relative speeds of getting a chunk of data from a hard >> disk >> > vs. getting the same information from memory, it's not all that shocking. >> > >> > Thanks, >> > Shawn >> > >> > >> >> >> -- >> - Joe >>