Re: svn commit: r303583 - head/sys/amd64/amd64

2016-07-31 Thread Bruce Evans
On Sun, 31 Jul 2016, Konstantin Belousov wrote: On Sun, Jul 31, 2016 at 11:11:25PM +1000, Bruce Evans wrote: I said that I didn't replace (sse2) pagecopy() by bcopy() on amd64 for Haswell. Actually I do, for a small improvement on makeworld. i386 doesn't have (sse*) pagecopy() except in

Re: svn commit: r303583 - head/sys/amd64/amd64

2016-07-31 Thread Konstantin Belousov
On Sun, Jul 31, 2016 at 11:11:25PM +1000, Bruce Evans wrote: > On Haswell, "rep stos" takes about 25 cycles to start up, and the function > call overhead is in the noise. 25 cycles is a lot. Haswell can move > 32 bytes/cycle from L2 to L2, so it misses moving 800 bytes or 1/5 of a > page in its

Re: svn commit: r303583 - head/sys/amd64/amd64

2016-07-31 Thread Slawa Olhovchenkov
On Sun, Jul 31, 2016 at 06:26:29PM +0300, Slawa Olhovchenkov wrote: > On Mon, Aug 01, 2016 at 12:30:14AM +1000, Bruce Evans wrote: > > > On Sun, 31 Jul 2016, Slawa Olhovchenkov wrote: > > > > > On Sun, Jul 31, 2016 at 11:11:25PM +1000, Bruce Evans wrote: > > > > > >> Misalignment of this loop

Re: svn commit: r303583 - head/sys/amd64/amd64

2016-07-31 Thread Slawa Olhovchenkov
On Mon, Aug 01, 2016 at 12:30:14AM +1000, Bruce Evans wrote: > On Sun, 31 Jul 2016, Slawa Olhovchenkov wrote: > > > On Sun, Jul 31, 2016 at 11:11:25PM +1000, Bruce Evans wrote: > > > >> Misalignment of this loop made it almost twice as slow on old Turion2 with > >> slow DDR2 memory. It made no

Re: svn commit: r303583 - head/sys/amd64/amd64

2016-07-31 Thread Bruce Evans
On Sun, 31 Jul 2016, Slawa Olhovchenkov wrote: On Sun, Jul 31, 2016 at 11:11:25PM +1000, Bruce Evans wrote: Misalignment of this loop made it almost twice as slow on old Turion2 with slow DDR2 memory. It made no difference on Haswell. I added an extra movnti, but that makes little or no

Re: svn commit: r303583 - head/sys/amd64/amd64

2016-07-31 Thread Slawa Olhovchenkov
On Sun, Jul 31, 2016 at 11:11:25PM +1000, Bruce Evans wrote: > Misalignment of this loop made it almost twice as slow on old Turion2 with > slow DDR2 memory. It made no difference on Haswell. I added an extra > movnti, but that makes little or no differences. 2 more movnti's wouldn't > fit in

Re: svn commit: r303583 - head/sys/amd64/amd64

2016-07-31 Thread Bruce Evans
On Sun, 31 Jul 2016, Mateusz Guzik wrote: Log: amd64: implement pagezero using rep stos The current implementation uses non-temporal writes. This turns out to be detrimental to performance if the page is used shortly after, which is the typical case with page faults. Switch to rep stos.

svn commit: r303583 - head/sys/amd64/amd64

2016-07-31 Thread Mateusz Guzik
Author: mjg Date: Sun Jul 31 11:34:08 2016 New Revision: 303583 URL: https://svnweb.freebsd.org/changeset/base/303583 Log: amd64: implement pagezero using rep stos The current implementation uses non-temporal writes. This turns out to be detrimental to performance if the page is used