Re: CPA patchset

2008-01-11 Thread dean gaudet
On Fri, 11 Jan 2008, dean gaudet wrote: > On Fri, 11 Jan 2008, Ingo Molnar wrote: > > > * Andi Kleen <[EMAIL PROTECTED]> wrote: > > > > > Cached requires the cache line to be read first before you can write > > > it. > > > > nonsense, and you should know it. It is perfectly possible to

Re: CPA patchset

2008-01-11 Thread Arjan van de Ven
On Fri, 11 Jan 2008 09:02:46 -0800 (PST) dean gaudet <[EMAIL PROTECTED]> wrote: > > Bulk ops (string ops, etc.) will do full cacheline writes too, > > without filling in the cacheline. > > on intel with fast strings enabled yes. mind you intel gives hints in > the documentation these

Re: CPA patchset

2008-01-11 Thread dean gaudet
On Fri, 11 Jan 2008, Ingo Molnar wrote: > * Andi Kleen <[EMAIL PROTECTED]> wrote: > > > Cached requires the cache line to be read first before you can write > > it. > > nonsense, and you should know it. It is perfectly possible to construct > fully written cachelines, without reading the

Re: CPA patchset

2008-01-11 Thread Andi Kleen
> It is perfectly possible to construct > fully written cachelines, without reading the cacheline first. MOVDQ is If you write a aligned full 64 (or 128) byte area and even then you can have occassional reads which can be either painfully slow or even incorrect. > but that's totally besides

Re: CPA patchset

2008-01-11 Thread Andi Kleen
> Write-Combining can be very useful for devices that are behind a slow or > a high-latency transport, such as PCI, and which are mapped UnCached That is what I wrote! If you meant the same we must have been spectacularly miscommunicating. -Andi -- To unsubscribe from this list: send the line

Re: CPA patchset

2008-01-11 Thread Andi Kleen
It is perfectly possible to construct fully written cachelines, without reading the cacheline first. MOVDQ is If you write a aligned full 64 (or 128) byte area and even then you can have occassional reads which can be either painfully slow or even incorrect. but that's totally besides the

Re: CPA patchset

2008-01-11 Thread Andi Kleen
Write-Combining can be very useful for devices that are behind a slow or a high-latency transport, such as PCI, and which are mapped UnCached That is what I wrote! If you meant the same we must have been spectacularly miscommunicating. -Andi -- To unsubscribe from this list: send the line

Re: CPA patchset

2008-01-11 Thread dean gaudet
On Fri, 11 Jan 2008, Ingo Molnar wrote: * Andi Kleen [EMAIL PROTECTED] wrote: Cached requires the cache line to be read first before you can write it. nonsense, and you should know it. It is perfectly possible to construct fully written cachelines, without reading the cacheline

Re: CPA patchset

2008-01-11 Thread dean gaudet
On Fri, 11 Jan 2008, dean gaudet wrote: On Fri, 11 Jan 2008, Ingo Molnar wrote: * Andi Kleen [EMAIL PROTECTED] wrote: Cached requires the cache line to be read first before you can write it. nonsense, and you should know it. It is perfectly possible to construct fully

Re: CPA patchset

2008-01-11 Thread Arjan van de Ven
On Fri, 11 Jan 2008 09:02:46 -0800 (PST) dean gaudet [EMAIL PROTECTED] wrote: Bulk ops (string ops, etc.) will do full cacheline writes too, without filling in the cacheline. on intel with fast strings enabled yes. mind you intel gives hints in the documentation these operations don't

Re: CPA patchset

2008-01-10 Thread Ingo Molnar
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > I think you have it fundamentally backwards: the best for > > > performance is WB + cflush. What would WC offer for performance > > > that cflush cannot do? > > > > Cached requires the cache line to be read first before you can write > > it. >

Re: CPA patchset

2008-01-10 Thread Ingo Molnar
* Andi Kleen <[EMAIL PROTECTED]> wrote: > > > > but that's not too smart: why dont they use WB plus cflush > > > > instead? > > > > > > Because they need to access it WC for performance. > > > > I think you have it fundamentally backwards: the best for > > performance is WB + cflush. What

Re: CPA patchset

2008-01-10 Thread Andi Kleen
On Thu, Jan 10, 2008 at 01:22:04PM +0100, Ingo Molnar wrote: > > * Andi Kleen <[EMAIL PROTECTED]> wrote: > > > > What is very real though are the hard limitations of MTRRs. So i'd > > > rather first like to see a clean PAT approach (which all other > > > modern OSs have already migrated to in

Re: CPA patchset

2008-01-10 Thread Ingo Molnar
* Andi Kleen <[EMAIL PROTECTED]> wrote: > > What is very real though are the hard limitations of MTRRs. So i'd > > rather first like to see a clean PAT approach (which all other > > modern OSs have already migrated to in the past 10 years) > > That's mostly orthogonal. Don't know why you

Re: CPA patchset

2008-01-10 Thread Andi Kleen
On Thu, Jan 10, 2008 at 11:57:26AM +0100, Ingo Molnar wrote: > > > > > > WBINVD isnt particular fast (takes a few msecs), but why is > > > > > that a problem? Drivers dont do high-frequency ioremap-ing. > > > > > It's typically only done at driver/device startup and that's > > > > >

Re: CPA patchset

2008-01-10 Thread Andi Kleen
tly orthogonal. Don't know why you bring it up now? Anyways more efficient c_p_a() makes PAT usage easier. > structural cleanups and bugfixes you did as well, which would allow us > to phase out MTRR use (of the DRM drivers, etc.), and _then_ layer an > (optional) cflush approach bas

Re: CPA patchset

2008-01-10 Thread Ingo Molnar
* Andi Kleen <[EMAIL PROTECTED]> wrote: > > > > WBINVD isnt particular fast (takes a few msecs), but why is > > > > that a problem? Drivers dont do high-frequency ioremap-ing. > > > > It's typically only done at driver/device startup and that's > > > > it. > > > > > > Actually

Re: CPA patchset

2008-01-10 Thread Andi Kleen
On Thu, Jan 10, 2008 at 08:20:26PM +1000, Dave Airlie wrote: > This is only possible as long as we know all the parts involved, for > example on AMD we have problems with that > over-eager prefetching so for drivers on AMD chipsets we have to do > something else more than likely using

Re: CPA patchset

2008-01-10 Thread Ingo Molnar
Ss have already migrated to in the past 10 years), with all the structural cleanups and bugfixes you did as well, which would allow us to phase out MTRR use (of the DRM drivers, etc.), and _then_ layer an (optional) cflush approach basically as the final step. Right now cflush is mixed inextractably

Re: CPA patchset

2008-01-10 Thread Dave Airlie
On Jan 10, 2008 7:55 PM, Andi Kleen <[EMAIL PROTECTED]> wrote: > On Thu, Jan 10, 2008 at 07:44:03PM +1000, Dave Airlie wrote: > > > > > > finally managed to get the time to review your CPA patchset, and i > > > fundamentally agree with most of the

Re: CPA patchset

2008-01-10 Thread Andi Kleen
On Thu, Jan 10, 2008 at 11:04:43AM +0100, Ingo Molnar wrote: > > * Andi Kleen <[EMAIL PROTECTED]> wrote: > > > > WBINVD isnt particular fast (takes a few msecs), but why is that a > > > problem? Drivers dont do high-frequency ioremap-ing. It's > > > typically only done at driver/device

Re: CPA patchset

2008-01-10 Thread Ingo Molnar
* Andi Kleen <[EMAIL PROTECTED]> wrote: > > WBINVD isnt particular fast (takes a few msecs), but why is that a > > problem? Drivers dont do high-frequency ioremap-ing. It's > > typically only done at driver/device startup and that's it. > > Actually graphics drivers can do higher

Re: CPA patchset

2008-01-10 Thread Ingo Molnar
* Dave Airlie <[EMAIL PROTECTED]> wrote: > > - firstly, there's no rationale given. So we'll change ioremap()/etc. > > from doing a cflush-range instruction instead of a WBINVD. But why? > > WBINVD isnt particular fast (takes a few msecs), but why is that a > > problem? Drivers dont do

Re: CPA patchset

2008-01-10 Thread Andi Kleen
On Thu, Jan 10, 2008 at 07:44:03PM +1000, Dave Airlie wrote: > > > > finally managed to get the time to review your CPA patchset, and i > > fundamentally agree with most of the detail changes done in it. But here > > are a few structural high-level observations: &g

Re: CPA patchset

2008-01-10 Thread Andi Kleen
On Thu, Jan 10, 2008 at 10:31:26AM +0100, Ingo Molnar wrote: > > Andi, > > finally managed to get the time to review your CPA patchset, and i > fundamentally agree with most of the detail changes done in it. But here > are a few structural high-level observations: I

Re: CPA patchset

2008-01-10 Thread Dave Airlie
> > finally managed to get the time to review your CPA patchset, and i > fundamentally agree with most of the detail changes done in it. But here > are a few structural high-level observations: > > - firstly, there's no rationale given. So we'll change ioremap()/etc. > from

Re: CPA patchset

2008-01-10 Thread Ingo Molnar
Andi, finally managed to get the time to review your CPA patchset, and i fundamentally agree with most of the detail changes done in it. But here are a few structural high-level observations: - firstly, there's no rationale given. So we'll change ioremap()/etc. from doing a cflush-range

Re: CPA patchset

2008-01-10 Thread Ingo Molnar
Andi, finally managed to get the time to review your CPA patchset, and i fundamentally agree with most of the detail changes done in it. But here are a few structural high-level observations: - firstly, there's no rationale given. So we'll change ioremap()/etc. from doing a cflush-range

Re: CPA patchset

2008-01-10 Thread Dave Airlie
finally managed to get the time to review your CPA patchset, and i fundamentally agree with most of the detail changes done in it. But here are a few structural high-level observations: - firstly, there's no rationale given. So we'll change ioremap()/etc. from doing a cflush-range

Re: CPA patchset

2008-01-10 Thread Andi Kleen
On Thu, Jan 10, 2008 at 10:31:26AM +0100, Ingo Molnar wrote: Andi, finally managed to get the time to review your CPA patchset, and i fundamentally agree with most of the detail changes done in it. But here are a few structural high-level observations: I have a few changes and will post

Re: CPA patchset

2008-01-10 Thread Andi Kleen
On Thu, Jan 10, 2008 at 07:44:03PM +1000, Dave Airlie wrote: finally managed to get the time to review your CPA patchset, and i fundamentally agree with most of the detail changes done in it. But here are a few structural high-level observations: - firstly, there's no rationale given

Re: CPA patchset

2008-01-10 Thread Ingo Molnar
* Dave Airlie [EMAIL PROTECTED] wrote: - firstly, there's no rationale given. So we'll change ioremap()/etc. from doing a cflush-range instruction instead of a WBINVD. But why? WBINVD isnt particular fast (takes a few msecs), but why is that a problem? Drivers dont do

Re: CPA patchset

2008-01-10 Thread Ingo Molnar
* Andi Kleen [EMAIL PROTECTED] wrote: WBINVD isnt particular fast (takes a few msecs), but why is that a problem? Drivers dont do high-frequency ioremap-ing. It's typically only done at driver/device startup and that's it. Actually graphics drivers can do higher frequency

Re: CPA patchset

2008-01-10 Thread Andi Kleen
On Thu, Jan 10, 2008 at 11:04:43AM +0100, Ingo Molnar wrote: * Andi Kleen [EMAIL PROTECTED] wrote: WBINVD isnt particular fast (takes a few msecs), but why is that a problem? Drivers dont do high-frequency ioremap-ing. It's typically only done at driver/device startup and

Re: CPA patchset

2008-01-10 Thread Dave Airlie
On Jan 10, 2008 7:55 PM, Andi Kleen [EMAIL PROTECTED] wrote: On Thu, Jan 10, 2008 at 07:44:03PM +1000, Dave Airlie wrote: finally managed to get the time to review your CPA patchset, and i fundamentally agree with most of the detail changes done in it. But here are a few structural

Re: CPA patchset

2008-01-10 Thread Ingo Molnar
the structural cleanups and bugfixes you did as well, which would allow us to phase out MTRR use (of the DRM drivers, etc.), and _then_ layer an (optional) cflush approach basically as the final step. Right now cflush is mixed inextractably into the CPA patchset. WBINVD latency is really the last of our

Re: CPA patchset

2008-01-10 Thread Andi Kleen
On Thu, Jan 10, 2008 at 08:20:26PM +1000, Dave Airlie wrote: This is only possible as long as we know all the parts involved, for example on AMD we have problems with that over-eager prefetching so for drivers on AMD chipsets we have to do something else more than likely using

Re: CPA patchset

2008-01-10 Thread Ingo Molnar
* Andi Kleen [EMAIL PROTECTED] wrote: WBINVD isnt particular fast (takes a few msecs), but why is that a problem? Drivers dont do high-frequency ioremap-ing. It's typically only done at driver/device startup and that's it. Actually graphics drivers can do higher

Re: CPA patchset

2008-01-10 Thread Andi Kleen
. structural cleanups and bugfixes you did as well, which would allow us to phase out MTRR use (of the DRM drivers, etc.), and _then_ layer an (optional) cflush approach basically as the final step. Right now cflush is mixed inextractably into the CPA patchset. WBINVD latency is really You

Re: CPA patchset

2008-01-10 Thread Andi Kleen
On Thu, Jan 10, 2008 at 11:57:26AM +0100, Ingo Molnar wrote: WBINVD isnt particular fast (takes a few msecs), but why is that a problem? Drivers dont do high-frequency ioremap-ing. It's typically only done at driver/device startup and that's it. Actually

Re: CPA patchset

2008-01-10 Thread Ingo Molnar
* Andi Kleen [EMAIL PROTECTED] wrote: What is very real though are the hard limitations of MTRRs. So i'd rather first like to see a clean PAT approach (which all other modern OSs have already migrated to in the past 10 years) That's mostly orthogonal. Don't know why you bring it up

Re: CPA patchset

2008-01-10 Thread Ingo Molnar
* Andi Kleen [EMAIL PROTECTED] wrote: but that's not too smart: why dont they use WB plus cflush instead? Because they need to access it WC for performance. I think you have it fundamentally backwards: the best for performance is WB + cflush. What would WC offer for