Lee Whatley, Contractor wrote:
Pete Wyckoff wrote:
I think it's best if I get around to doing the event-driven bmi_ib
rather than polling and see if that magically fixes it. Playing
thread scheduling tricks will get us in trouble, as Nathan points
out.
Well, I'm hoping to up
Pete Wyckoff wrote:
I think it's best if I get around to doing the event-driven bmi_ib
rather than polling and see if that magically fixes it. Playing
thread scheduling tricks will get us in trouble, as Nathan points
out.
Well, I'm hoping to upgrade this cluster from RHEL3 (2.4 kernel) to
RHE
Pete Wyckoff wrote:
I think we can declare success and chalk it up to weird scheduler
behavior. I'll run some tests myself, then probably stick that
sched_yield in the source, as it should be harmless for more recent
kernels in that location.
The fact that mkdir is eating 10% CPU on the server
Pete Wyckoff wrote:
If you want to try a couple of patches, just for kicks, we can maybe
hack around this alleged starvation problem. First, we can toss in
a sched_yield() and see if it magically gets the kernel to make the
dbpf thread run instead. Near the bottom of the function
BMI_ib_testcon
Hey Pete,
I'll try to get this to you as soon as I can...hopefully later this week.
Oh btw, the LD_ASSUME_KERNEL trick didn't work. Apparrently you can't
do that with a x86_64 kernel.
Thanks,
-Lee
Pete Wyckoff wrote:
I've come to a couple partial conclusions, and more requests for
informat
Murali Vilayannur wrote:
Hi Lee,
Does the kernel that you are running support NPTL threads at all?
I recall that using LinuxThreads on Opteron x86_64 is strongly discouraged
..
Alternatively, maybe NPTL is the problem on outdated 2.4 kernels.. does
the RHEL3 update include the futex bug fix repor
Pete Wyckoff wrote:
I would like to get the full list involved. Can you make a more
concise trace for them. Do something like the following:
cd /tmp
service pvfs2 start
/usr/local/bin/pvfs2-set-debugmask -m /u/data1 verbose
pvfs2-mkdir /u/data1/foo
/usr/local/bin/pvfs2-set-
Pete Wyckoff wrote:
Another data point, if it is easy for you to setup: are things just
as slow using bmi_tcp rather than bmi_ib?
No, bmi_tcp works great (both over our regular ethernet interfaces and
our IBoIP interfaces). It's just bmi_ib that causes the slowness.
Pete Wyckoff wrote:
That doesn't sound familiar to me. I guess the first thing to do is
enable debugging and get a server log with timestamps so we can try
to figure it out. You may have to mail the log off-list as they
tend to be big.
Will do. I'll be sending you the log off-list either tod
Murali Vilayannur wrote:
As Rob pointed out, the pvfs2-migrate-collection utility in src/apps/admin
does migrate a 1.4.0 trove volume to the CVS trove format correctly. So I
guess you could just run that and let us know it you encounter any
problems...
Thanks,
Murali
The pvfs2-migrate-collectio
Murali,
Thanks! I updated my CVS tree this morning, and the 2.4 kernel compiles
and installs just fine.
I am unable to start my pvfs2 server using this CVS version though. I
get these messages in my error logs:
[D 10:02:40.452641] PVFS2 Server version 1.4.1pre1-2006-05-22-144949 starting
11 matches
Mail list logo