"Chen, Kenneth W" <[EMAIL PROTECTED]> wrote:
>
> Let me work on the readv/writev support (unless someone beat me to it).
Please also move it to the address_space_operations level. Yes, there are
performance benefits from simply omitting the LFS checks, the mmap
consistency fixes, etc. But
Andrew Morton wrote on Thursday, March 10, 2005 12:31 PM
> > > Fine-grained alignment is probably too hard, and it should fall back to
> > > __blockdev_direct_IO().
> > >
> > > Does it do the right thing with a request which is non-page-aligned, but
> > > 512-byte aligned?
> > >
> > > readv
"Chen, Kenneth W" <[EMAIL PROTECTED]> wrote:
>
> Losing 6% just from Linux kernel is a huge deal for this type of benchmark.
> People work for days to implement features which might give sub percentage
> gain. Making Software run faster is not easy, but making software run slower
> apparently
Andrew Morton wrote on Wednesday, March 09, 2005 8:10 PM
> > 2.6.9 kernel is 6% slower compare to distributor's 2.4 kernel (RHEL3).
> > Roughly
> > 2% came from storage driver (I'm not allowed to say anything beyond that,
> > there
> > is a fix though).
>
> The codepaths are indeed longer in
Andrew Morton wrote on Wednesday, March 09, 2005 8:10 PM
2.6.9 kernel is 6% slower compare to distributor's 2.4 kernel (RHEL3).
Roughly
2% came from storage driver (I'm not allowed to say anything beyond that,
there
is a fix though).
The codepaths are indeed longer in 2.6.
Thank
Chen, Kenneth W [EMAIL PROTECTED] wrote:
Losing 6% just from Linux kernel is a huge deal for this type of benchmark.
People work for days to implement features which might give sub percentage
gain. Making Software run faster is not easy, but making software run slower
apparently is a
Andrew Morton wrote on Thursday, March 10, 2005 12:31 PM
Fine-grained alignment is probably too hard, and it should fall back to
__blockdev_direct_IO().
Does it do the right thing with a request which is non-page-aligned, but
512-byte aligned?
readv and writev?
Chen, Kenneth W [EMAIL PROTECTED] wrote:
Let me work on the readv/writev support (unless someone beat me to it).
Please also move it to the address_space_operations level. Yes, there are
performance benefits from simply omitting the LFS checks, the mmap
consistency fixes, etc. But they're
"Chen, Kenneth W" <[EMAIL PROTECTED]> wrote:
>
> > Did you generate a kernel profile?
>
> Top 40 kernel hot functions, percentage is normalized to kernel utilization.
>
> _spin_unlock_irqrestore 23.54%
> _spin_unlock_irq 19.27%
Cripes.
Is that with
On Wednesday, March 9, 2005 3:23 pm, Andi Kleen wrote:
> "Chen, Kenneth W" <[EMAIL PROTECTED]> writes:
> > Just to clarify here, these data need to be taken at grain of salt. A
> > high count in _spin_unlock_* functions do not automatically points to
> > lock contention. It's one of the blind
On Wed, 09 Mar 2005, Chen, Kenneth W wrote:
> Andrew Morton wrote Wednesday, March 09, 2005 6:26 PM
> > What does "1/3 of the total benchmark performance regression" mean? One
> > third of 0.1% isn't very impressive. You haven't told us anything at all
> > about the magnitude of this
On Wednesday, March 9, 2005 3:23 pm, Andi Kleen wrote:
> "Chen, Kenneth W" <[EMAIL PROTECTED]> writes:
> > Just to clarify here, these data need to be taken at grain of salt. A
> > high count in _spin_unlock_* functions do not automatically points to
> > lock contention. It's one of the blind
On Wed, 9 Mar 2005, Chen, Kenneth W wrote:
Also, I'm rather peeved that we're hearing about this regression now rather
than two years ago. And mystified as to why yours is the only group which
has reported it.
2.6.X kernel has never been faster than the 2.4 kernel (RHEL3). At one
point
of time,
"Chen, Kenneth W" <[EMAIL PROTECTED]> wrote:
>
> Andrew Morton wrote Wednesday, March 09, 2005 6:26 PM
> > What does "1/3 of the total benchmark performance regression" mean? One
> > third of 0.1% isn't very impressive. You haven't told us anything at all
> > about the magnitude of this
David Lang <[EMAIL PROTECTED]> wrote:
>
> (I've seen a 50%
> performance hit on 2.4 with just a thousand or two threads compared to
> 2.6)
Was that 2.4 kernel a vendor kernel with the O(1) scheduler?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a
Andrew Morton wrote Wednesday, March 09, 2005 6:26 PM
> What does "1/3 of the total benchmark performance regression" mean? One
> third of 0.1% isn't very impressive. You haven't told us anything at all
> about the magnitude of this regression.
2.6.9 kernel is 6% slower compare to distributor's
"Chen, Kenneth W" <[EMAIL PROTECTED]> wrote:
>
> This is all real: real benchmark running on real hardware, with real
> result showing large performance regression. Nothing synthetic here.
>
Ken, could you *please* be more complete, more organized and more specific?
What does "1/3 of the
"Chen, Kenneth W" <[EMAIL PROTECTED]> wrote:
>
> Andrew Morton wrote on Wednesday, March 09, 2005 2:45 PM
> > >
> > > > Did you generate a kernel profile?
> > >
> > > Top 40 kernel hot functions, percentage is normalized to kernel
> utilization.
> > >
> > > _spin_unlock_irqrestore
Chen, Kenneth W wrote on Wednesday, March 09, 2005 5:45 PM
> Andrew Morton wrote on Wednesday, March 09, 2005 5:34 PM
> > What are these percentages? Total CPU time? The direct-io stuff doesn't
> > look too bad. It's surprising that tweaking the direct-io submission code
> > makes much
Andrew Morton wrote on Wednesday, March 09, 2005 2:45 PM
> >
> > > Did you generate a kernel profile?
> >
> > Top 40 kernel hot functions, percentage is normalized to kernel
> > utilization.
> >
> > _spin_unlock_irqrestore23.54%
> > _spin_unlock_irq 19.27%
>
>
Andrew Morton wrote on Wednesday, March 09, 2005 5:34 PM
> What are these percentages? Total CPU time? The direct-io stuff doesn't
> look too bad. It's surprising that tweaking the direct-io submission code
> makes much difference.
Percentage is relative to total kernel time. There are three
For people who is dying to see some q-tool profile, here is one.
It's not a vanilla 2.6.9 kernel, but with patches in raw device
to get around the DIO performance problem.
- Ken
Flat profile of CPU_CYCLES in hist#0:
Each histogram sample counts as 255.337u seconds
% time self cumul
Jesse Barnes wrote on Wednesday, March 09, 2005 3:53 PM
> > "Chen, Kenneth W" <[EMAIL PROTECTED]> writes:
> > > Just to clarify here, these data need to be taken at grain of salt. A
> > > high count in _spin_unlock_* functions do not automatically points to
> > > lock contention. It's one of the
Andi Kleen wrote on Wednesday, March 09, 2005 3:23 PM
> > Just to clarify here, these data need to be taken at grain of salt. A
> > high count in _spin_unlock_* functions do not automatically points to
> > lock contention. It's one of the blind spot syndrome with timer based
> > profile on ia64.
"Chen, Kenneth W" <[EMAIL PROTECTED]> writes:
>
> Just to clarify here, these data need to be taken at grain of salt. A
> high count in _spin_unlock_* functions do not automatically points to
> lock contention. It's one of the blind spot syndrome with timer based
> profile on ia64. There are
Chen, Kenneth W wrote on Wednesday, March 09, 2005 1:59 PM
> > Did you generate a kernel profile?
>
> Top 40 kernel hot functions, percentage is normalized to kernel utilization.
>
> _spin_unlock_irqrestore 23.54%
> _spin_unlock_irq 19.27%
>
>
> Profile with
Andrew Morton wrote on Wednesday, March 09, 2005 12:05 PM
> "Chen, Kenneth W" <[EMAIL PROTECTED]> wrote:
> > Let me answer the questions in reverse order. We started with running
> > industry standard transaction processing database benchmark on 2.6 kernel,
> > on real hardware (4P smp, 64 GB
"Chen, Kenneth W" <[EMAIL PROTECTED]> wrote:
>
> Andrew Morton wrote on Tuesday, March 08, 2005 10:28 PM
> > But before doing anything else, please bench this on real hardware,
> > see if it is worth pursuing.
>
> Let me answer the questions in reverse order. We started with running
> industry
Andrew Morton wrote on Tuesday, March 08, 2005 10:28 PM
> But before doing anything else, please bench this on real hardware,
> see if it is worth pursuing.
Let me answer the questions in reverse order. We started with running
industry standard transaction processing database benchmark on 2.6
Andrew Morton wrote on Tuesday, March 08, 2005 10:28 PM
But before doing anything else, please bench this on real hardware,
see if it is worth pursuing.
Let me answer the questions in reverse order. We started with running
industry standard transaction processing database benchmark on 2.6
Chen, Kenneth W [EMAIL PROTECTED] wrote:
Andrew Morton wrote on Tuesday, March 08, 2005 10:28 PM
But before doing anything else, please bench this on real hardware,
see if it is worth pursuing.
Let me answer the questions in reverse order. We started with running
industry standard
Andrew Morton wrote on Wednesday, March 09, 2005 12:05 PM
Chen, Kenneth W [EMAIL PROTECTED] wrote:
Let me answer the questions in reverse order. We started with running
industry standard transaction processing database benchmark on 2.6 kernel,
on real hardware (4P smp, 64 GB memory, 450
Chen, Kenneth W wrote on Wednesday, March 09, 2005 1:59 PM
Did you generate a kernel profile?
Top 40 kernel hot functions, percentage is normalized to kernel utilization.
_spin_unlock_irqrestore 23.54%
_spin_unlock_irq 19.27%
Profile with spin lock
Chen, Kenneth W [EMAIL PROTECTED] writes:
Just to clarify here, these data need to be taken at grain of salt. A
high count in _spin_unlock_* functions do not automatically points to
lock contention. It's one of the blind spot syndrome with timer based
profile on ia64. There are some lock
Andi Kleen wrote on Wednesday, March 09, 2005 3:23 PM
Just to clarify here, these data need to be taken at grain of salt. A
high count in _spin_unlock_* functions do not automatically points to
lock contention. It's one of the blind spot syndrome with timer based
profile on ia64. There
Jesse Barnes wrote on Wednesday, March 09, 2005 3:53 PM
Chen, Kenneth W [EMAIL PROTECTED] writes:
Just to clarify here, these data need to be taken at grain of salt. A
high count in _spin_unlock_* functions do not automatically points to
lock contention. It's one of the blind spot
For people who is dying to see some q-tool profile, here is one.
It's not a vanilla 2.6.9 kernel, but with patches in raw device
to get around the DIO performance problem.
- Ken
Flat profile of CPU_CYCLES in hist#0:
Each histogram sample counts as 255.337u seconds
% time self cumul
Andrew Morton wrote on Wednesday, March 09, 2005 5:34 PM
What are these percentages? Total CPU time? The direct-io stuff doesn't
look too bad. It's surprising that tweaking the direct-io submission code
makes much difference.
Percentage is relative to total kernel time. There are three DIO
Andrew Morton wrote on Wednesday, March 09, 2005 2:45 PM
Did you generate a kernel profile?
Top 40 kernel hot functions, percentage is normalized to kernel
utilization.
_spin_unlock_irqrestore23.54%
_spin_unlock_irq 19.27%
Cripes.
Is that with
Chen, Kenneth W wrote on Wednesday, March 09, 2005 5:45 PM
Andrew Morton wrote on Wednesday, March 09, 2005 5:34 PM
What are these percentages? Total CPU time? The direct-io stuff doesn't
look too bad. It's surprising that tweaking the direct-io submission code
makes much difference.
Chen, Kenneth W [EMAIL PROTECTED] wrote:
Andrew Morton wrote on Wednesday, March 09, 2005 2:45 PM
Did you generate a kernel profile?
Top 40 kernel hot functions, percentage is normalized to kernel
utilization.
_spin_unlock_irqrestore 23.54%
Chen, Kenneth W [EMAIL PROTECTED] wrote:
This is all real: real benchmark running on real hardware, with real
result showing large performance regression. Nothing synthetic here.
Ken, could you *please* be more complete, more organized and more specific?
What does 1/3 of the total
Andrew Morton wrote Wednesday, March 09, 2005 6:26 PM
What does 1/3 of the total benchmark performance regression mean? One
third of 0.1% isn't very impressive. You haven't told us anything at all
about the magnitude of this regression.
2.6.9 kernel is 6% slower compare to distributor's 2.4
David Lang [EMAIL PROTECTED] wrote:
(I've seen a 50%
performance hit on 2.4 with just a thousand or two threads compared to
2.6)
Was that 2.4 kernel a vendor kernel with the O(1) scheduler?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to
Chen, Kenneth W [EMAIL PROTECTED] wrote:
Andrew Morton wrote Wednesday, March 09, 2005 6:26 PM
What does 1/3 of the total benchmark performance regression mean? One
third of 0.1% isn't very impressive. You haven't told us anything at all
about the magnitude of this regression.
2.6.9
On Wed, 9 Mar 2005, Chen, Kenneth W wrote:
Also, I'm rather peeved that we're hearing about this regression now rather
than two years ago. And mystified as to why yours is the only group which
has reported it.
2.6.X kernel has never been faster than the 2.4 kernel (RHEL3). At one
point
of time,
On Wednesday, March 9, 2005 3:23 pm, Andi Kleen wrote:
Chen, Kenneth W [EMAIL PROTECTED] writes:
Just to clarify here, these data need to be taken at grain of salt. A
high count in _spin_unlock_* functions do not automatically points to
lock contention. It's one of the blind spot syndrome
On Wed, 09 Mar 2005, Chen, Kenneth W wrote:
Andrew Morton wrote Wednesday, March 09, 2005 6:26 PM
What does 1/3 of the total benchmark performance regression mean? One
third of 0.1% isn't very impressive. You haven't told us anything at all
about the magnitude of this regression.
On Wednesday, March 9, 2005 3:23 pm, Andi Kleen wrote:
Chen, Kenneth W [EMAIL PROTECTED] writes:
Just to clarify here, these data need to be taken at grain of salt. A
high count in _spin_unlock_* functions do not automatically points to
lock contention. It's one of the blind spot syndrome
Chen, Kenneth W [EMAIL PROTECTED] wrote:
Did you generate a kernel profile?
Top 40 kernel hot functions, percentage is normalized to kernel utilization.
_spin_unlock_irqrestore 23.54%
_spin_unlock_irq 19.27%
Cripes.
Is that with CONFIG_PREEMPT? If
"Chen, Kenneth W" <[EMAIL PROTECTED]> wrote:
>
> Direct I/O on block device running 2.6.X kernel is a lot SLOWER
> than running on a 2.4 Kernel!
>
A little bit slower, it appears. It used to be faster.
> ...
>
> synchronous I/O AIO
>
Christoph Hellwig wrote on Tuesday, March 08, 2005 6:20 PM
> this is not the blockdevice, but the obsolete raw device driver. Please
> benchmark and if nessecary fix the blockdevice O_DIRECT codepath insted
> as the raw driver is slowly going away.
>From performance perspective, can raw device
> --- linux-2.6.9/drivers/char/raw.c2004-10-18 14:54:37.0 -0700
> +++ linux-2.6.9.ken/drivers/char/raw.c2005-03-08 17:22:07.0
> -0800
this is not the blockdevice, but the obsolete raw device driver. Please
benchmark and if nessecary fix the blockdevice O_DIRECT
OK, last one in the series: user level test programs that stress
the kernel I/O stack. Pretty dull stuff.
- Ken
diff -Nur zero/aio_null.c blknull_test/aio_null.c
--- zero/aio_null.c 1969-12-31 16:00:00.0 -0800
+++ blknull_test/aio_null.c 2005-03-08 00:46:17.0 -0800
@@
OK, last one in the series: user level test programs that stress
the kernel I/O stack. Pretty dull stuff.
- Ken
diff -Nur zero/aio_null.c blknull_test/aio_null.c
--- zero/aio_null.c 1969-12-31 16:00:00.0 -0800
+++ blknull_test/aio_null.c 2005-03-08 00:46:17.0 -0800
@@
--- linux-2.6.9/drivers/char/raw.c2004-10-18 14:54:37.0 -0700
+++ linux-2.6.9.ken/drivers/char/raw.c2005-03-08 17:22:07.0
-0800
this is not the blockdevice, but the obsolete raw device driver. Please
benchmark and if nessecary fix the blockdevice O_DIRECT codepath
Christoph Hellwig wrote on Tuesday, March 08, 2005 6:20 PM
this is not the blockdevice, but the obsolete raw device driver. Please
benchmark and if nessecary fix the blockdevice O_DIRECT codepath insted
as the raw driver is slowly going away.
From performance perspective, can raw device be
Chen, Kenneth W [EMAIL PROTECTED] wrote:
Direct I/O on block device running 2.6.X kernel is a lot SLOWER
than running on a 2.4 Kernel!
A little bit slower, it appears. It used to be faster.
...
synchronous I/O AIO
58 matches
Mail list logo