Hi, We have run solr in VM environments extensively (3.6 not Cloud, but the issues will be similar). There are some significant things to be aware of when running Solr in a virtualized environment (these can be equally true with Hyper-V and Xen as well): If you're doing heavy indexing, the networking can be a real bottleneck, depending on the environment. If you're using a virtual cluster, and you have other VMs that use lots of network and/or CPU (e.g. a SQL Server, email etc.), you will encounter performance issues (note: it's generally a good idea to tie a Solr instance to a physical machine in the cluster). Using virtual switches can, in some instances, create network bottlenecks, particularly with high input indexing. There are myriad scenarios for vSwitches, so it's not practical to go into all the possible scenarios here - but the general rule is - be careful! CPU context switching can have a huge impact on Solr, so assigning CPUs, cores and virtual cores needs some care to ensure there's enough CPU resource to get the jobs done, but not so many the VM is continually waiting for cores to become free (VMWare will wait until all configured core slots are free before proceeding with a request).
The above scratches the surface of running multi-threaded production applications like Solr in a virtual environment, but hopefully it can provide a staring point. Thanks, Peter On Wed, Apr 17, 2013 at 11:56 AM, adfel70 <adfe...@gmail.com> wrote: > Hi > We are currently considering running solr cloud on vmware. > Di you have any insights regarding the issue you encountered and generally > regarding using virtual machines instead of physical machines for solr > cloud? > > > Frank Wennerdahl wrote > > Hi Otis and thanks for your response. > > > > We are indeed suspecting that the problem with only 2 cores being used > > might > > be caused by the virtual environment. We're hoping that someone with > > experience of running Solr on VMWare might know more about this or the > > other > > issues we have. > > > > The servlet we're running is the bundled Jetty servlet (Solr version > 4.1). > > As we have seen a higher number of CPU cores utilized when sending data > to > > Solr locally it seems that the servlet isn't restricting the number of > > threads used. > > > > Frank > > > > -----Original Message----- > > From: Otis Gospodnetic [mailto: > > > otis.gospodnetic@ > > > ] > > Sent: den 26 mars 2013 05:09 > > To: > > > solr-user@.apache > > > Subject: Re: Scaling Solr on VMWare > > > > Hi Frank, > > > > If your servlet container had a crazy low setting for the max number of > > threads I think you would see the CPU underutilized. But I think you > > would > > also see errors in on the client about connections being requested. > > Sounds > > like a possibly VM issue that's not Solr-specific... > > > > Otis > > -- > > Solr & ElasticSearch Support > > http://sematext.com/ > > > > > > > > > > > > On Mon, Mar 25, 2013 at 1:18 PM, Frank Wennerdahl > > < > > > frank.wennerdahl@ > > > > wrote: > >> Hi. > >> > >> > >> > >> We are currently benchmarking our Solr setup and are having trouble > >> with scaling hardware for a single Solr instance. We want to > >> investigate how one instance scales with hardware to find the optimal > >> ratio of hardware vs sharding when scaling. Our main problem is that > >> we cannot identify any hardware limitations, CPU is far from maxed > >> out, disk I/O is not an issue as far as we can see and there is plenty > of > > RAM available. > >> > >> > >> > >> In short we have a couple of questions that we hope someone here could > >> help us with. Detailed information about our setup, use case and > >> things we've tried is provided below the questions. > >> > >> > >> > >> Questions: > >> > >> 1. What could cause Solr to utilize only 2 CPU cores when sending > >> multiple update requests in parallel in a VMWare environment? > >> > >> 2. Is there a software limit on the number of CPU cores that Solr > > can > >> utilize while indexing? > >> > >> 3. Ruling out network and disk performance, what could cause a > >> decrease in indexing speed when sending data over a network as opposed > >> to sending it from the local machine? > >> > >> > >> > >> We are running on three cores per Solr instance, however only one core > >> receives any non-trivial load. We are using VMWare (ESX 5.0) virtual > >> machines for hosting Solr and a QNAP NAS containing 12 HDDs in a RAID5 > >> setup for storage. Our data consists of a huge amount of small-sized > > documents. > >> When indexing we are using Solr's javabin format (although not through > >> Solrj, we have implemented the format in C#/.NET) and our batch size > >> is currently 1000 documents. The actual size of the data varies, but > >> the batches we have used range from approximately 450KB to 1050KB. > >> We're sending these batches to Solr in parallel using a number of send > > threads. > >> > >> > >> > >> There are two issues that we've run into: > >> > >> 1. When sending data from one VM to Solr on another VM we observed > >> that Solr did not seem to utilize CPU cores properly. The Solr VM had > >> 8 vCPUs available and we were using 4 threads sending data in > >> parallel. We saw a low (~29%) CPU utilization on the Solr VM with 2 > >> cores doing almost all the work while the remaining cores remained > >> almost idle. Increasing the number of send threads to 8 yielded the > >> same result, capping our indexing speed to about 4.88MB per second. > >> The client VM had 4 vCPUs which were hardly utilized as we were reading > > data from pre-generated files. > >> > >> To rule out network limitations we sent the test data to a server on > >> the Solr VM that simply accepted the request and returned an empty > >> response. We were able to send data at 219MB per second, so the > >> network did not seem to be the bottleneck. We also tested sending data > >> to Solr locally from the Solr VM to see if disk I/O was the problem. > >> Surprisingly we were able to index significantly faster at 7.34MB per > >> second using 4 send threads (8.4MB with 6 send threads) which > >> indicated that the disk was not slowing us down when sending data over > >> the network. Worth noting is that the CPU utilization was now higher > >> (47,81% with 4 threads, 58,8% with 6) and the work was spread out over > >> all cores. As before we used pre-generated files and the process sending > > the data used almost no CPU. > >> > >> 2. We decided to investigate how Solr would scale with additional > >> vCPUs when indexing locally. We increased the number of vCPUs to 16 > >> and the number of send threads to 8. Sadly we now experienced a > >> decrease in > >> performance: 7MB/s with 8 threads, 6.4MB/s with 12 threads and 4.95/s > >> with > >> 16 threads. The CPU usage was in average 30%, regardless of the number > >> of threads used. We know that additional vCPUs can cause decreased > >> performance in VMWare virtual machines due to time waiting for CPUs to > > become available. > >> We investigated this using esxtop which only showed a 1% CSTP. > >> According to VMWare > >> < > http://kb.vmware.com/selfservice/microsites/search.do?language=en_US& > > > cmd=di splayKC&externalId=1005362> a CSTP above 3% could indictate > >> that multiple vCPUs are causing performance issues. > >> > >> We noticed that the average disk write speed seemed to cap at around > >> 11.5 million bytes per second so we tested the same VM setup using a > > faster disk. > >> This did not yield any increase in performance (it was actually > >> somewhat slower), neither did using a RAM-mapped drive for Solr. > >> > >> > >> > >> Any help or ideas of what could be the bottleneck in our setup would > >> be greatly appreciated! > >> > >> > >> > >> Best regards, > >> > >> Frank Wennerdahl > >> > >> Developer > >> > >> Arcadelia AB > >> > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Scaling-Solr-on-VMWare-tp4051153p4056637.html > Sent from the Solr - User mailing list archive at Nabble.com. >