Yes, that’s the Xenial I tried. Ubuntu 16.04.2 LTS. On 5/1/17, 7:22 PM, "Will Martin" <wmartin...@outlook.com> wrote:
Ubuntu 16.04 LTS - Xenial (HVM) Is this your Xenial version? On 5/1/2017 6:37 PM, Jeff Wartes wrote: > I tried a few variations of various things before we found and tried that linux/EC2 tuning page, including: > - EC2 instance type: r4, c4, and i3 > - Ubuntu version: Xenial and Trusty > - EBS vs local storage > - Stock openjdk vs Zulu openjdk (Recent java8 in both cases - I’m aware of the issues with early java8 versions and I’m not using G1) > > Most of those attempts were to help reduce differences between the data center and the EC2 cluster. In all cases I re-indexed from scratch. I got the same very high system-time symptom in all cases. With the linux changes in place, we settled on r4/Xenial/EBS/Stock. > > Again, this was a slightly modified Solr 5.4, (I added backup requests, and two memory allocation rate tweaks that have long since been merged into mainline - released in 6.2 I think. I can dig up the jira numbers if anyone’s interested) I’ve never used Solr 6.x in production though. > The only reason I mentioned 6.x at all is because I’m aware that ES 5.x is based on Lucene 6.2. I don’t believe my coworker spent any time on tuning his ES setup, although I think he did try G1. > > I definitely do want to binary-search those settings until I understand better what exactly did the trick. > It’s a long cycle time per test is the problem, but hopefully in the next couple of weeks. > > > > On 5/1/17, 7:26 AM, "John Bickerstaff" <j...@johnbickerstaff.com> wrote: > > It's also very important to consider the type of EC2 instance you are > using... > > We settled on the R4.2XL... The R series is labeled "High-Memory" > > Which instance type did you end up using? > > On Mon, May 1, 2017 at 8:22 AM, Shawn Heisey <apa...@elyograg.org> wrote: > > > On 4/28/2017 10:09 AM, Jeff Wartes wrote: > > > tldr: Recently, I tried moving an existing solrcloud configuration from > > a local datacenter to EC2. Performance was roughly 1/10th what I’d > > expected, until I applied a bunch of linux tweaks. > > > > How very strange. I knew virtualization would have overheard, possibly > > even measurable overhead, but that's insane. Running on bare metal is > > always better if you can do it. I would be curious what would happen on > > your original install if you applied similar tuning to that. Would you > > see a speedup there? > > > > > Interestingly, a coworker playing with a ElasticSearch (ES 5.x, so a > > much more recent release) alternate implementation of the same index was > > not seeing this high-system-time behavior on EC2, and was getting > > throughput consistent with our general expectations. > > > > That's even weirder. ES 5.x will likely be using Points field types for > > numeric fields, and although those are faster than what Solr currently > > uses, I doubt it could explain that difference. The implication here is > > that the ES systems are running with stock EC2 settings, not the tuned > > settings ... but I'd like you to confirm that. Same Java version as > > with Solr? IMHO, Java itself is more likely to cause issues like you > > saw than Solr. > > > > > I’m writing this for a few reasons: > > > > > > 1. The performance difference was so crazy I really feel like this > > should really be broader knowledge. > > > > Definitely agree! I would be very interested in learning which of the > > tunables you changed were major contributors to the improvement. If it > > turns out that Solr's code is sub-optimal in some way, maybe we can fix it. > > > > > 2. If anyone is aware of anything that changed in Lucene between > > 5.4 and 6.x that could explain why Elasticsearch wasn’t suffering from > > this? If it’s the clocksource that’s the issue, there’s an implication that > > Solr was using tons more system calls like gettimeofday that the EC2 (xen) > > hypervisor doesn’t allow in userspace. > > > > I had not considered the performance regression in 6.4.0 and 6.4.1 that > > Erick mentioned. Were you still running Solr 5.4, or was it a 6.x version? > > > > ============= > > > > Specific thoughts on the tuning: > > > > The noatime option is very good to use. I also use nodiratime on my > > systems. Turning these off can have *massive* impacts on disk > > performance. If these are the source of the speedup, then the machine > > doesn't have enough spare memory. > > > > I'd be wary of the "nobarrier" mount option. If the underlying storage > > has battery-backed write caches, or is SSD without write caching, it > > wouldn't be a problem. Here's info about the "discard" mount option, I > > don't know whether it applies to your amazon storage: > > > > discard/nodiscard > > Controls whether ext4 should issue discard/TRIM commands > > to the > > underlying block device when blocks are freed. This is > > useful > > for SSD devices and sparse/thinly-provisioned LUNs, but > > it is > > off by default until sufficient testing has been done. > > > > The network tunables would have more of an effect in a distributed > > environment like EC2 than they would on a LAN. > > > > Thanks, > > Shawn > > > > > >