Re: 2.4.0-test10 Sluggish After Load

2000-11-05 Thread Christoph Rohland
Hi Rik, Rik van Riel <[EMAIL PROTECTED]> writes: > On 4 Nov 2000, Christoph Rohland wrote: > > I do see two problems here: > > 1) shm_swap_core does not handle the failure of prepare_higmem_swapout > >right and basically cannot do so. It gets called zone independant > >and should

Re: 2.4.0-test10 Sluggish After Load

2000-11-05 Thread Rik van Riel
On 4 Nov 2000, Christoph Rohland wrote: > Rik van Riel <[EMAIL PROTECTED]> writes: > > Indeed, shared memory performance still sucks rocks. > > No, it's not a performance problem. It is a hard lockup problem on > highmem machines. > > I do see two problems here: > 1) shm_swap_core does not

Re: 2.4.0-test10 Sluggish After Load

2000-11-05 Thread Christoph Rohland
Hi Rik, Rik van Riel [EMAIL PROTECTED] writes: On 4 Nov 2000, Christoph Rohland wrote: I do see two problems here: 1) shm_swap_core does not handle the failure of prepare_higmem_swapout right and basically cannot do so. It gets called zone independant and should probably get

Re: 2.4.0-test10 Sluggish After Load

2000-11-04 Thread Christoph Rohland
Hi Rik, Rik van Riel <[EMAIL PROTECTED]> writes: > Indeed, shared memory performance still sucks rocks. No, it's not a performance problem. It is a hard lockup problem on highmem machines. I do see two problems here: 1) shm_swap_core does not handle the failure of prepare_higmem_swapout

Re: 2.4.0-test10 Sluggish After Load

2000-11-04 Thread Rik van Riel
On 3 Nov 2000, Christoph Rohland wrote: > On Wed, 1 Nov 2000, Rik van Riel wrote: > > The 2.4 VM is basically pretty good when you're not > > thrashing and has efficient management of the VM until > > your working set reaches the size of physical memory. > > > > But once you hit the thrashing

Re: 2.4.0-test10 Sluggish After Load

2000-11-04 Thread Rik van Riel
On 3 Nov 2000, Christoph Rohland wrote: On Wed, 1 Nov 2000, Rik van Riel wrote: The 2.4 VM is basically pretty good when you're not thrashing and has efficient management of the VM until your working set reaches the size of physical memory. But once you hit the thrashing point, the VM

Re: 2.4.0-test10 Sluggish After Load

2000-11-04 Thread Christoph Rohland
Hi Rik, Rik van Riel [EMAIL PROTECTED] writes: Indeed, shared memory performance still sucks rocks. No, it's not a performance problem. It is a hard lockup problem on highmem machines. I do see two problems here: 1) shm_swap_core does not handle the failure of prepare_higmem_swapout right

Re: 2.4.0-test10 Sluggish After Load

2000-11-03 Thread Peter Samuelson
[matthew] > ls /proc > killscript > added "kill -9" to the beginning and "\" to the end of each line, > ran it as the database user. It worked pretty well. Sounds like a lot of trouble. su {oracle} -c 'kill -9 -1' Or is there some reason that wouldn't have worked in your case? Peter - To

Re: Thrash reduction & RE: 2.4.0-test10 Sluggish After Load

2000-11-03 Thread Christoph Rohland
Hi Jonathan, On Fri, 3 Nov 2000, Jonathan George wrote: > I wonder how much of that memory is actually being used by your > processes. My guess is that it's not the whole thing (unless you > are running on a 64bit architecture). Yes of course it is using the whole memory. That's what the

Thrash reduction & RE: 2.4.0-test10 Sluggish After Load

2000-11-03 Thread Jonathan George
-Original Message- From: Christoph Rohland [mailto:[EMAIL PROTECTED]] Sent: Friday, November 03, 2000 7:54 AM To: Rik van Riel Cc: Jonathan George; '[EMAIL PROTECTED]'; '[EMAIL PROTECTED]' Subject: Re: 2.4.0-test10 Sluggish After Load Hi Rik, >On Wed, 1 Nov 2000, Rik van Riel wr

Re: 2.4.0-test10 Sluggish After Load

2000-11-03 Thread Christoph Rohland
Hi Rik, On Wed, 1 Nov 2000, Rik van Riel wrote: > The 2.4 VM is basically pretty good when you're not > thrashing and has efficient management of the VM until > your working set reaches the size of physical memory. > > But once you hit the thrashing point, the VM falls > flat on its face. This

Re: 2.4.0-test10 Sluggish After Load

2000-11-03 Thread Christoph Rohland
Hi Rik, On Wed, 1 Nov 2000, Rik van Riel wrote: The 2.4 VM is basically pretty good when you're not thrashing and has efficient management of the VM until your working set reaches the size of physical memory. But once you hit the thrashing point, the VM falls flat on its face. This is a

Thrash reduction RE: 2.4.0-test10 Sluggish After Load

2000-11-03 Thread Jonathan George
-Original Message- From: Christoph Rohland [mailto:[EMAIL PROTECTED]] Sent: Friday, November 03, 2000 7:54 AM To: Rik van Riel Cc: Jonathan George; '[EMAIL PROTECTED]'; '[EMAIL PROTECTED]' Subject: Re: 2.4.0-test10 Sluggish After Load Hi Rik, On Wed, 1 Nov 2000, Rik van Riel wrote

Re: Thrash reduction RE: 2.4.0-test10 Sluggish After Load

2000-11-03 Thread Christoph Rohland
Hi Jonathan, On Fri, 3 Nov 2000, Jonathan George wrote: I wonder how much of that memory is actually being used by your processes. My guess is that it's not the whole thing (unless you are running on a 64bit architecture). Yes of course it is using the whole memory. That's what the highmem

Re: 2.4.0-test10 Sluggish After Load

2000-11-03 Thread Peter Samuelson
[matthew] ls /proc killscript added "kill -9" to the beginning and "\" to the end of each line, ran it as the database user. It worked pretty well. Sounds like a lot of trouble. su {oracle} -c 'kill -9 -1' Or is there some reason that wouldn't have worked in your case? Peter - To

Re: 2.4.0-test10 Sluggish After Load

2000-11-02 Thread matthew
On Thu, 2 Nov 2000, Rik van Riel wrote: > > > Of course, this also depends on the amount of people willing > > > to test out new VM patches and/or help with development. > > > > As you know I am doing regular thrash tests and I am willing to do > > this further. I would hate to see a customer

Re: 2.4.0-test10 Sluggish After Load

2000-11-02 Thread Christoph Rohland
Hi Rik, On Wed, 1 Nov 2000, Rik van Riel wrote: >> BTW, I think that everyone is happy with the direction of the >> new VM. I'm looking forward to your upcoming enhancements which >> I hope will make it in to a later 2.4 release. > > I'm working on it. I have no idea if it'll be ready in time

Re: 2.4.0-test10 Sluggish After Load

2000-11-02 Thread Christoph Rohland
Hi Rik, On Wed, 1 Nov 2000, Rik van Riel wrote: BTW, I think that everyone is happy with the direction of the new VM. I'm looking forward to your upcoming enhancements which I hope will make it in to a later 2.4 release. I'm working on it. I have no idea if it'll be ready in time for

Re: 2.4.0-test10 Sluggish After Load

2000-11-02 Thread matthew
On Thu, 2 Nov 2000, Rik van Riel wrote: Of course, this also depends on the amount of people willing to test out new VM patches and/or help with development. As you know I am doing regular thrash tests and I am willing to do this further. I would hate to see a customer go down

Re: 2.4.0-test10 Sluggish After Load

2000-11-01 Thread Rik van Riel
On Wed, 1 Nov 2000, Sean Hunter wrote: > Yup. What seems to have happened is that waking up 1800 > processes at once has caused the box to thrash so hard it is > taking ages for any one process to get enough scheduler time to > clean itself up and exit. > > I guess we may need a thrash

Re: 2.4.0-test10 Sluggish After Load

2000-11-01 Thread Sean Hunter
On Wed, Nov 01, 2000 at 11:10:46AM -0600, matthew wrote: > On Wed, 1 Nov 2000, Sean Hunter wrote: > > > Pardon my speculations (if I am wrong), but isn't this an oracle question? > > > It could be. > > > > Isn't oracle killing the server by trying to clean up 1800 connections all at > >

RE: 2.4.0-test10 Sluggish After Load

2000-11-01 Thread Rik van Riel
On Wed, 1 Nov 2000, matthew wrote: > The "thrashing" has been going on for roughly 10 hours now. Is > there a point at which I can expect it to stop? The load > average is at 441 (down from > 700 last night), and the stress > program was killed at 1:00AM CST last night. This (obviously) >

Re: 2.4.0-test10 Sluggish After Load

2000-11-01 Thread matthew
On Wed, 1 Nov 2000, Sean Hunter wrote: > Pardon my speculations (if I am wrong), but isn't this an oracle question? It could be. > Isn't oracle killing the server by trying to clean up 1800 connections all at > once? When they're all connected, most of the work is done by one or two >

RE: 2.4.0-test10 Sluggish After Load

2000-11-01 Thread matthew
>>Rik van Riel said: >> The problem may well be that Oracle wants to clean up >> all memory at once, accessing much more memory than >> it did while under stress with more tricky access >> patterns. >> >> If this looks bad to you, compare the points where 2.2 >> starts thrashing and where 2.4

RE: 2.4.0-test10 Sluggish After Load

2000-11-01 Thread Rik van Riel
On Wed, 1 Nov 2000, Jonathan George wrote: > It sounded to me as if his machine never actually recovered from > thrashing. That's the nature of thrashing ... nothing in the system is able to make any progress, hence it takes ages until the situation changes... > Futhermore, even a thrashing

RE: 2.4.0-test10 Sluggish After Load

2000-11-01 Thread Jonathan George
-Original Message- From: Rik van Riel [mailto:[EMAIL PROTECTED]] Sent: Wednesday, November 01, 2000 11:06 AM To: Jonathan George Cc: '[EMAIL PROTECTED]'; '[EMAIL PROTECTED]' Subject: RE: 2.4.0-test10 Sluggish After Load On Wed, 1 Nov 2000, Jonathan George wrote: > It might be help

RE: 2.4.0-test10 Sluggish After Load

2000-11-01 Thread Rik van Riel
On Wed, 1 Nov 2000, Jonathan George wrote: > It might be helpful to show the current (post crippled) results > of top. Futhermore, a list of allocated ipc resources (share > memory, etc.) and open files (lsof) would be nice. The problem may well be that Oracle wants to clean up all memory at

RE: 2.4.0-test10 Sluggish After Load

2000-11-01 Thread Jonathan George
Matt, It might be helpful to show the current (post crippled) results of top. Futhermore, a list of allocated ipc resources (share memory, etc.) and open files (lsof) would be nice. --Jonathan-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message

RE: 2.4.0-test10 Sluggish After Load

2000-11-01 Thread Jonathan George
Matt, It might be helpful to show the current (post crippled) results of top. Futhermore, a list of allocated ipc resources (share memory, etc.) and open files (lsof) would be nice. --Jonathan-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message

RE: 2.4.0-test10 Sluggish After Load

2000-11-01 Thread Rik van Riel
On Wed, 1 Nov 2000, Jonathan George wrote: It might be helpful to show the current (post crippled) results of top. Futhermore, a list of allocated ipc resources (share memory, etc.) and open files (lsof) would be nice. The problem may well be that Oracle wants to clean up all memory at once,

RE: 2.4.0-test10 Sluggish After Load

2000-11-01 Thread Jonathan George
-Original Message- From: Rik van Riel [mailto:[EMAIL PROTECTED]] Sent: Wednesday, November 01, 2000 11:06 AM To: Jonathan George Cc: '[EMAIL PROTECTED]'; '[EMAIL PROTECTED]' Subject: RE: 2.4.0-test10 Sluggish After Load On Wed, 1 Nov 2000, Jonathan George wrote: It might be helpful

RE: 2.4.0-test10 Sluggish After Load

2000-11-01 Thread Rik van Riel
On Wed, 1 Nov 2000, Jonathan George wrote: It sounded to me as if his machine never actually recovered from thrashing. That's the nature of thrashing ... nothing in the system is able to make any progress, hence it takes ages until the situation changes... Futhermore, even a thrashing case

RE: 2.4.0-test10 Sluggish After Load

2000-11-01 Thread matthew
Rik van Riel said: The problem may well be that Oracle wants to clean up all memory at once, accessing much more memory than it did while under stress with more tricky access patterns. SNIP If this looks bad to you, compare the points where 2.2 starts thrashing and where 2.4 starts

Re: 2.4.0-test10 Sluggish After Load

2000-11-01 Thread matthew
On Wed, 1 Nov 2000, Sean Hunter wrote: Pardon my speculations (if I am wrong), but isn't this an oracle question? It could be. Isn't oracle killing the server by trying to clean up 1800 connections all at once? When they're all connected, most of the work is done by one or two oracle

RE: 2.4.0-test10 Sluggish After Load

2000-11-01 Thread Rik van Riel
On Wed, 1 Nov 2000, matthew wrote: The "thrashing" has been going on for roughly 10 hours now. Is there a point at which I can expect it to stop? The load average is at 441 (down from 700 last night), and the stress program was killed at 1:00AM CST last night. This (obviously) isn't an

Re: 2.4.0-test10 Sluggish After Load

2000-11-01 Thread Rik van Riel
On Wed, 1 Nov 2000, Sean Hunter wrote: Yup. What seems to have happened is that waking up 1800 processes at once has caused the box to thrash so hard it is taking ages for any one process to get enough scheduler time to clean itself up and exit. I guess we may need a thrash preventer