Re: [Pvfs2-developers] Re: [Pvfs2-users] PVFS2 on Infiniband

2006-08-21 Thread Lee Whatley, Contractor
Lee Whatley, Contractor wrote: Pete Wyckoff wrote: I think it's best if I get around to doing the event-driven bmi_ib rather than polling and see if that magically fixes it. Playing thread scheduling tricks will get us in trouble, as Nathan points out. Well, I'm hoping to up

Re: [lwhatley....@navo.hpc.mil: Re: [Pvfs2-developers] Re: [Pvfs2-users] PVFS2 on Infiniband]

2006-06-08 Thread Lee Whatley, Contractor
Pete Wyckoff wrote: I think it's best if I get around to doing the event-driven bmi_ib rather than polling and see if that magically fixes it. Playing thread scheduling tricks will get us in trouble, as Nathan points out. Well, I'm hoping to upgrade this cluster from RHEL3 (2.4 kernel) to RHE

Re: [lwhatley....@navo.hpc.mil: Re: [Pvfs2-developers] Re: [Pvfs2-users] PVFS2 on Infiniband]

2006-06-06 Thread Lee Whatley, Contractor
Pete Wyckoff wrote: I think we can declare success and chalk it up to weird scheduler behavior. I'll run some tests myself, then probably stick that sched_yield in the source, as it should be harmless for more recent kernels in that location. The fact that mkdir is eating 10% CPU on the server

Re: [lwhatley....@navo.hpc.mil: Re: [Pvfs2-developers] Re: [Pvfs2-users] PVFS2 on Infiniband]

2006-06-05 Thread Lee Whatley, Contractor
Pete Wyckoff wrote: If you want to try a couple of patches, just for kicks, we can maybe hack around this alleged starvation problem. First, we can toss in a sched_yield() and see if it magically gets the kernel to make the dbpf thread run instead. Near the bottom of the function BMI_ib_testcon

Re: [lwhatley....@navo.hpc.mil: Re: [Pvfs2-developers] Re: [Pvfs2-users] PVFS2 on Infiniband]

2006-05-30 Thread Lee Whatley, Contractor
Hey Pete, I'll try to get this to you as soon as I can...hopefully later this week. Oh btw, the LD_ASSUME_KERNEL trick didn't work. Apparrently you can't do that with a x86_64 kernel. Thanks, -Lee Pete Wyckoff wrote: I've come to a couple partial conclusions, and more requests for informat

Re: [lwhatley....@navo.hpc.mil: Re: [Pvfs2-developers] Re: [Pvfs2-users] PVFS2 on Infiniband]

2006-05-25 Thread Lee Whatley, Contractor
Murali Vilayannur wrote: Hi Lee, Does the kernel that you are running support NPTL threads at all? I recall that using LinuxThreads on Opteron x86_64 is strongly discouraged .. Alternatively, maybe NPTL is the problem on outdated 2.4 kernels.. does the RHEL3 update include the futex bug fix repor

Re: [lwhatley....@navo.hpc.mil: Re: [Pvfs2-developers] Re: [Pvfs2-users] PVFS2 on Infiniband]

2006-05-25 Thread Lee Whatley, Contractor
Pete Wyckoff wrote: I would like to get the full list involved. Can you make a more concise trace for them. Do something like the following: cd /tmp service pvfs2 start /usr/local/bin/pvfs2-set-debugmask -m /u/data1 verbose pvfs2-mkdir /u/data1/foo /usr/local/bin/pvfs2-set-

Re: [lwhatley....@navo.hpc.mil: Re: [Pvfs2-developers] Re: [Pvfs2-users] PVFS2 on Infiniband]

2006-05-23 Thread Lee Whatley, Contractor
Pete Wyckoff wrote: Another data point, if it is easy for you to setup: are things just as slow using bmi_tcp rather than bmi_ib? No, bmi_tcp works great (both over our regular ethernet interfaces and our IBoIP interfaces). It's just bmi_ib that causes the slowness.

Re: [lwhatley....@navo.hpc.mil: Re: [Pvfs2-developers] Re: [Pvfs2-users] PVFS2 on Infiniband]

2006-05-23 Thread Lee Whatley, Contractor
Pete Wyckoff wrote: That doesn't sound familiar to me. I guess the first thing to do is enable debugging and get a server log with timestamps so we can try to figure it out. You may have to mail the log off-list as they tend to be big. Will do. I'll be sending you the log off-list either tod

Re: [lwhatley....@navo.hpc.mil: Re: [Pvfs2-developers] Re: [Pvfs2-users] PVFS2 on Infiniband]

2006-05-23 Thread Lee Whatley, Contractor
Murali Vilayannur wrote: As Rob pointed out, the pvfs2-migrate-collection utility in src/apps/admin does migrate a 1.4.0 trove volume to the CVS trove format correctly. So I guess you could just run that and let us know it you encounter any problems... Thanks, Murali The pvfs2-migrate-collectio

Re: [lwhatley....@navo.hpc.mil: Re: [Pvfs2-developers] Re: [Pvfs2-users] PVFS2 on Infiniband]

2006-05-22 Thread Lee Whatley, Contractor
Murali, Thanks! I updated my CVS tree this morning, and the 2.4 kernel compiles and installs just fine. I am unable to start my pvfs2 server using this CVS version though. I get these messages in my error logs: [D 10:02:40.452641] PVFS2 Server version 1.4.1pre1-2006-05-22-144949 starting