Hi Alessandro, > --- a/lib_generic/string.c > +++ b/lib_generic/string.c > @@ -449,7 +449,16 @@ char * bcopy(const char * src, char * dest, int count) > void * memcpy(void * dest,const void *src,size_t count) > { > char *tmp = (char *) dest, *s = (char *) src; > + u32 *d32 = (u32 *)dest, *s32 = (u32 *) src; > > + /* if both are aligned, use 32-bit copy */ > + if ( (((int)dest & 3) | ((int)src & 3) | (count & 3)) == 0 ) { > + count /= 4; > + while (count--) > + *d32++ = *s32++; > + return dest; > + } > + /* else, use 1-byte copy */ > while (count--) > *tmp++ = *s++;
If we're adding this logic, what about adding it such that: if (src/dest are 32-bit aligned and count > 3) { perform 32-bit copies till count <= 3 } perform remaining 8-bit copies till count == 0 You'd still get the performance boost but not have the requirement that count is evenly divisible by 4. You could do byte copies before the 32-bit copies to align the src/dest in some cases, but that might be overkill... Same comment goes for the memset implementation. Best, Peter _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot