On Tue, Sep 4, 2012 at 5:42 AM, Yevgeny Kliteynik <[email protected]> wrote: > On 8/30/2012 10:28 PM, Yong Qin wrote: >> On Thu, Aug 30, 2012 at 5:12 AM, Jeff Squyres<[email protected]> wrote: >>> On Aug 29, 2012, at 2:25 PM, Yong Qin wrote: >>> >>>> This issue has been observed on OMPI 1.6 and 1.6.1 with openib btl but >>>> not on 1.4.5 (tcp btl is always fine). The application is VASP and >>>> only one specific dataset is identified during the testing, and the OS >>>> is SL 6.2 with kernel 2.6.32-220.23.1.el6.x86_64. The issue is that >>>> when a certain type of load is put on OMPI 1.6.x, khugepaged thread >>>> always runs with 100% CPU load, and it looks to me like that OMPI is >>>> waiting for some memory to be available thus appears to be hung. >>>> Reducing the per node processes would sometimes ease the problem a bit >>>> but not always. So I did some further testing by playing around with >>>> the kernel transparent hugepage support. >>>> >>>> 1. Disable transparent hugepage support completely (echo never >>>>> /sys/kernel/mm/redhat_transparent_hugepage/enabled). This would allow >>>> the program to progress as normal (as in 1.4.5). Total run time for an >>>> iteration is 3036.03 s. >>> >>> I'll admit that we have not tested using transparent hugepages. I wonder >>> if there's some kind of bad interaction going on here... >> >> The transparent hugepage is "transparent", which means it is >> automatically applied to all applications unless it is explicitly told >> otherwise. I highly suspect that it is not working properly in this >> case. > > Like Jeff said - I don't think we've ever tested OMPI with transparent > huge pages. >
Thanks. But have you tested OMPI under RHEL 6 or its variants (CentOS 6, SL 6)? THP is on by default in RHEL 6 so no matter you want it or not it's there. >>> >>> What exactly does changing this setting do? >> >> Here (http://lwn.net/Articles/423592/) is a pretty good documentation >> on what these settings would do to the behaviour of the THP. I don't >> think I can explain it better than the article so I will leave it to >> you to digest. :) >> >>> >>>> 2. Disable VM defrag effort (echo never >>>>> /sys/kernel/mm/redhat_transparent_hugepage/defrag). This allows the >>>> program to run as well, but the performance is horrible. The same >>>> iteration takes 4967.40 s. >>>> >>>> 3. Disable defrag in khugepaged (echo no >>>>> /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag). This >>>> allows the program to run, and the performance is worse than #1 but >>>> better than #2. The same iteration takes 3348.10 s. >>>> >>>> 4. Disable both VM defrag and khugepaged defrag (#2 + #3). Similar >>>> performance as #3. >>>> >>>> So my question is, looks to me like this has to do with the memory >>>> management in the openib btl, are we using huge pages in 1.6.x? If >>>> that is true, is there a better way to resolve or workaround it within >>>> OMPI itself without disabling transparent hugepage support? We'd like >>>> to keep the hugepage support if possible. >>> >>> Mellanox -- can you comment on this? > > Actually, I don't think that THP were really tested with OFED. > I can think of lots of ways thing can go wrong there. > This might be a good question to address to Linux-RDMA mailing list. > This is quite useful information. I guess we will just turn off THP support for now. > -- YK
