Hi, A few weeks back (at a PyPy sprint) someone asked me why amd64/OpenBSD has no assembler implementation of memset(3). After asking on icb, there were a couple of theories:
a) Perhaps the available assembler implementations of memset are slower than our C one. b) Perhaps due to a), no-one got round to it. It turns out that (on the systems I benchmarked on), FreeBSD's memset.S [1] is faster than our memset.c in libc. Those interested can see the results (including graphs) of some benchmarks comparing FreeBSD memset.S and our memset.C here: https://github.com/vext01/openbsd-libc-benchmarks In short, each experiment warms up by setting and checking a load of buffers before setting as many buffers as possible given a one minute timeframe. The experiments were run with varying buffer sizes under both memset.S and memset.c. During experimentation, the machines were otherwise idle. Although the results vary from system to system, it seems that memset.S is between 6 and 30 times faster. The results also show that there was no case (that we tested) where memset.c was faster than memset.S. Thw following diff enables memset.S in libc on amd64. * Is what I have done with the vendor keywords acceptable? (moved -- but preserving order -- them to the top and removed __FBSDID). * I removed the non-executable stack hint as I don't see anything similar in other .S files in-tree. * I don't think any library bump is needed. Can someone confirm this? I have run with this diff for the last week or so with no issues. I have been running some heavy compilation tasks during this time (building lang/pypy). [1] http://svnweb.freebsd.org/base/head/lib/libc/amd64/string/memset.S?revision=217106&view=markup PS. If people think this kind of work is worthwhile, then there are some other routines we could borrow from the other BSDs too. Index: lib/libc/arch/amd64/string/Makefile.inc =================================================================== RCS file: /cvs/src/lib/libc/arch/amd64/string/Makefile.inc,v retrieving revision 1.4 diff -u -p -r1.4 Makefile.inc --- lib/libc/arch/amd64/string/Makefile.inc 4 Sep 2012 03:10:42 -0000 1.4 +++ lib/libc/arch/amd64/string/Makefile.inc 18 Sep 2013 17:05:10 -0000 @@ -3,4 +3,4 @@ SRCS+= bcmp.c ffs.S index.c memchr.c memcmp.c bcopy.c bzero.c \ rindex.c strcat.c strcmp.c strcpy.c strcspn.c strlen.c \ strncat.c strncmp.c strncpy.c strpbrk.c strsep.c \ - strspn.c strstr.c swab.c memset.c strlcpy.c strlcat.c + strspn.c strstr.c swab.c memset.S strlcpy.c strlcat.c Index: lib/libc/arch/amd64/string/memset.S =================================================================== RCS file: lib/libc/arch/amd64/string/memset.S diff -N lib/libc/arch/amd64/string/memset.S --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ lib/libc/arch/amd64/string/memset.S 18 Sep 2013 17:05:10 -0000 @@ -0,0 +1,58 @@ +/* $OpenBSD$ */ +/* FreeBSD revision: 217106 */ +/* $NetBSD: memset.S,v 1.3 2004/02/26 20:50:06 drochner Exp $ */ +/* + * Written by J.T. Conklin <j...@netbsd.org>. + * Public domain. + * Adapted for NetBSD/x86_64 by Frank van der Linden <f...@wasabisystems.com> + */ + +#include <machine/asm.h> + +ENTRY(memset) + movq %rsi,%rax + andq $0xff,%rax + movq %rdx,%rcx + movq %rdi,%r11 + + cld /* set fill direction forward */ + + /* + * if the string is too short, it's really not worth the overhead + * of aligning to word boundries, etc. So we jump to a plain + * unaligned set. + */ + cmpq $0x0f,%rcx + jle L1 + + movb %al,%ah /* copy char to all bytes in word */ + movl %eax,%edx + sall $16,%eax + orl %edx,%eax + + movl %eax,%edx + salq $32,%rax + orq %rdx,%rax + + movq %rdi,%rdx /* compute misalignment */ + negq %rdx + andq $7,%rdx + movq %rcx,%r8 + subq %rdx,%r8 + + movq %rdx,%rcx /* set until word aligned */ + rep + stosb + + movq %r8,%rcx + shrq $3,%rcx /* set by words */ + rep + stosq + + movq %r8,%rcx /* set remainder by bytes */ + andq $7,%rcx +L1: rep + stosb + movq %r11,%rax + + ret -- Best Regards Edd Barrett http://www.theunixzoo.co.uk