Re: [RFC] Documentation about unaligned memory access
On Fri, 23 November 2007 00:15:53 +, Daniel Drake wrote: > > What's the definition of an unaligned access? > = > > Unaligned memory accesses occur when you try to read N bytes of data starting > from an address that is not evenly divisible by N (i.e. addr % N != 0). > For example, reading 4 bytes of data from address 0x1004 is fine, but > reading 4 bytes of data from address 0x1005 would be an unaligned memory > access. The wording could also apply to a DMA of 8k from a 4k-aligned address. But I don't have a good idea how to improve it. > It's safe to assume that memcpy will always copy bytewise and hence will > never cause an unaligned access. s/always copy/always behave as if copying/ memcpy usually copies at least wordwise, possibly even in bigger chunks. But that is just the inner loop. Unaligned bytes at the beginning/end receive special treatment. Jörn -- The rabbit runs faster than the fox, because the rabbit is rinning for his life while the fox is only running for his dinner. -- Aesop - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Fri, 23 November 2007 00:15:53 +, Daniel Drake wrote: What's the definition of an unaligned access? = Unaligned memory accesses occur when you try to read N bytes of data starting from an address that is not evenly divisible by N (i.e. addr % N != 0). For example, reading 4 bytes of data from address 0x1004 is fine, but reading 4 bytes of data from address 0x1005 would be an unaligned memory access. The wording could also apply to a DMA of 8k from a 4k-aligned address. But I don't have a good idea how to improve it. It's safe to assume that memcpy will always copy bytewise and hence will never cause an unaligned access. s/always copy/always behave as if copying/ memcpy usually copies at least wordwise, possibly even in bigger chunks. But that is just the inner loop. Unaligned bytes at the beginning/end receive special treatment. Jörn -- The rabbit runs faster than the fox, because the rabbit is rinning for his life while the fox is only running for his dinner. -- Aesop - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Nov 23, 2007, at 5:43 AM, Heikki Orsila wrote: On Fri, Nov 23, 2007 at 12:15:53AM +, Daniel Drake wrote: Why unaligned access is bad === Most architectures are unable to perform unaligned memory accesses. Any unaligned access causes a processor exception. "Some architectures are unable to perform unaligned memory accesses, either an exception is generated, or the data access is silently invalid. In architectures that allow unaligned access, natural aligned accesses are usually faster than non-aligned." In summary: if your code causes unaligned memory accesses to happen, your code will not work on some platforms, and will perform *very* badly on others. *very* -> *slower* Natural alignment = Please move this definition before "Why unaligned access is bad". Also, it would be nice to have a table of ISAs: ISA NeedNeed natural alignment alignment by x m68kNo 2 powerpc/ppc Yes Word size on ppc it varies from processor to processor if misaligned data is fixed up or causes an exception. However its highly recommend to be naturally aligned. I'm not sure I follow what is meant by the second column (need alignment by x). - k - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
Em Mon, Nov 26, 2007 at 03:47:06PM +0100, Johannes Berg escreveu: > > > Sidenote: in the above example, you may wish to reorder the fields in the > > above structure so that the overall structure uses less memory. For example, > > moving field3 to sit inbetween field1 and field2 (where the padding is > > inserted) would shrink the overall structure by 1 byte: > > > > struct foo { > > u16 field1; > > u8 field3; > > u32 field2; > > }; > > You can reorder to u32, u16, u8 order and save another byte :) > > A reference to pahole could be appropriate here, and probably a small > note that some large existing structures like netdev have deliberate > holes to achieve cache alignment. shameless plug: https://ols2006.108.redhat.com/2007/Reprints/melo-Reprint.pdf - Arnaldo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Fri, Nov 23, 2007 at 01:43:29PM +0200, Heikki Orsila wrote: > On Fri, Nov 23, 2007 at 12:15:53AM +, Daniel Drake wrote: > > Why unaligned access is bad > > === > > > > Most architectures are unable to perform unaligned memory accesses. Any > > unaligned access causes a processor exception. > > "Some architectures are unable to perform unaligned memory accesses, > either an exception is generated, or the data > access is silently invalid. In architectures that allow unaligned > access, natural aligned accesses are usually faster than non-aligned." > > > In summary: if your code causes unaligned memory accesses to happen, your > > code > > will not work on some platforms, and will perform *very* badly on others. > > *very* -> *slower* > > > Natural alignment > > = > > Please move this definition before "Why unaligned access is bad". > > Also, it would be nice to have a table of ISAs: > > ISA NeedNeed > natural alignment > alignment by x > > m68k No 2 > powerpc/ppc Yes Word size > x86 No No > x86_64No No arm32 Yes 2 for 16bit data, 4 for 32bit Note, if the unaligned handler is running, the alignment will be fixed by the fault handler (at the cost of taking a fault). If the unaligned handler is turned off, you get a "free" shift of the data instead. -- Ben ([EMAIL PROTECTED], http://www.fluff.org/) 'a smiley only costs 4 bytes' - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
> Going back to an earlier example: > void myfunc(u8 *data, u32 value) > { > [...] > *((u16 *) data) = cpu_to_le32(value); > [...] typo? should it be a u32 cast? > To avoid the unaligned memory access, you could rewrite it as follows: > > void myfunc(u8 *data, u32 value) > { > [...] > value = cpu_to_le32(value); > memcpy(data, value, sizeof(value)); > [...] > } I think you should use put_unaligned here as well. Or maybe just reorder this vs. the section below where you use get/put_unaligned. johannes signature.asc Description: This is a digitally signed message part
Re: [RFC] Documentation about unaligned memory access
> Sidenote: in the above example, you may wish to reorder the fields in the > above structure so that the overall structure uses less memory. For example, > moving field3 to sit inbetween field1 and field2 (where the padding is > inserted) would shrink the overall structure by 1 byte: > > struct foo { > u16 field1; > u8 field3; > u32 field2; > }; You can reorder to u32, u16, u8 order and save another byte :) A reference to pahole could be appropriate here, and probably a small note that some large existing structures like netdev have deliberate holes to achieve cache alignment. johannes signature.asc Description: This is a digitally signed message part
Re: [RFC] Documentation about unaligned memory access
On Fri, 23 Nov 2007, Arne Georg Gleditsch wrote: > dean gaudet <[EMAIL PROTECTED]> writes: > > on AMD x86 pre-family 10h the boundary is 8 bytes, and on fam 10h it's 16 > > bytes. the penalty is a mere 3 cycles if an access crosses the specified > > boundary. > > Worth noting though, is that atomic accesses that cross cache lines on > an Opteron system is going to lock down the Hypertransport fabric for > you during the operation -- which is obviously not so nice. ooh awesome, i hadn't measured that before. on a 2 node sockF / revF with a random pointer chase running on cpu 0 / node 0 i see the avg load-to-load cache miss latency jump from 77ns to 109ns when i add an unaligned lock-intensive workload on one core of node 1. the worst i can get the pointer chase latency to is 273ns when i add two threads on node 1 fighting over an unaligned lock. on a 4 node (square) the worst case i can get seems to be an increase from 98ns with no antagonist to 385ns with 6 antagonists fighting over an unaligned lock on the other 3 nodes. cool. -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Nov 23, 2007 1:15 AM, Daniel Drake <[EMAIL PROTECTED]> wrote: [...] > > Before I do so, any comments on the following? > [...] > void myfunc(u8 *data, u32 value) > { > [...] > value = cpu_to_le32(value); > memcpy(data, value, sizeof(value)); > [...] > } I suppose you mean: memcpy(data, , sizeof(value)); /DM - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Nov 23, 2007 1:15 AM, Daniel Drake [EMAIL PROTECTED] wrote: [...] Before I do so, any comments on the following? [...] void myfunc(u8 *data, u32 value) { [...] value = cpu_to_le32(value); memcpy(data, value, sizeof(value)); [...] } I suppose you mean: memcpy(data, value, sizeof(value)); /DM - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Fri, 23 Nov 2007, Arne Georg Gleditsch wrote: dean gaudet [EMAIL PROTECTED] writes: on AMD x86 pre-family 10h the boundary is 8 bytes, and on fam 10h it's 16 bytes. the penalty is a mere 3 cycles if an access crosses the specified boundary. Worth noting though, is that atomic accesses that cross cache lines on an Opteron system is going to lock down the Hypertransport fabric for you during the operation -- which is obviously not so nice. ooh awesome, i hadn't measured that before. on a 2 node sockF / revF with a random pointer chase running on cpu 0 / node 0 i see the avg load-to-load cache miss latency jump from 77ns to 109ns when i add an unaligned lock-intensive workload on one core of node 1. the worst i can get the pointer chase latency to is 273ns when i add two threads on node 1 fighting over an unaligned lock. on a 4 node (square) the worst case i can get seems to be an increase from 98ns with no antagonist to 385ns with 6 antagonists fighting over an unaligned lock on the other 3 nodes. cool. -dean - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
Sidenote: in the above example, you may wish to reorder the fields in the above structure so that the overall structure uses less memory. For example, moving field3 to sit inbetween field1 and field2 (where the padding is inserted) would shrink the overall structure by 1 byte: struct foo { u16 field1; u8 field3; u32 field2; }; You can reorder to u32, u16, u8 order and save another byte :) A reference to pahole could be appropriate here, and probably a small note that some large existing structures like netdev have deliberate holes to achieve cache alignment. johannes signature.asc Description: This is a digitally signed message part
Re: [RFC] Documentation about unaligned memory access
Going back to an earlier example: void myfunc(u8 *data, u32 value) { [...] *((u16 *) data) = cpu_to_le32(value); [...] typo? should it be a u32 cast? To avoid the unaligned memory access, you could rewrite it as follows: void myfunc(u8 *data, u32 value) { [...] value = cpu_to_le32(value); memcpy(data, value, sizeof(value)); [...] } I think you should use put_unaligned here as well. Or maybe just reorder this vs. the section below where you use get/put_unaligned. johannes signature.asc Description: This is a digitally signed message part
Re: [RFC] Documentation about unaligned memory access
On Fri, Nov 23, 2007 at 01:43:29PM +0200, Heikki Orsila wrote: On Fri, Nov 23, 2007 at 12:15:53AM +, Daniel Drake wrote: Why unaligned access is bad === Most architectures are unable to perform unaligned memory accesses. Any unaligned access causes a processor exception. Some architectures are unable to perform unaligned memory accesses, either an exception is generated, or the data access is silently invalid. In architectures that allow unaligned access, natural aligned accesses are usually faster than non-aligned. In summary: if your code causes unaligned memory accesses to happen, your code will not work on some platforms, and will perform *very* badly on others. *very* - *slower* Natural alignment = Please move this definition before Why unaligned access is bad. Also, it would be nice to have a table of ISAs: ISA NeedNeed natural alignment alignment by x m68k No 2 powerpc/ppc Yes Word size x86 No No x86_64No No arm32 Yes 2 for 16bit data, 4 for 32bit Note, if the unaligned handler is running, the alignment will be fixed by the fault handler (at the cost of taking a fault). If the unaligned handler is turned off, you get a free shift of the data instead. -- Ben ([EMAIL PROTECTED], http://www.fluff.org/) 'a smiley only costs 4 bytes' - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
Em Mon, Nov 26, 2007 at 03:47:06PM +0100, Johannes Berg escreveu: Sidenote: in the above example, you may wish to reorder the fields in the above structure so that the overall structure uses less memory. For example, moving field3 to sit inbetween field1 and field2 (where the padding is inserted) would shrink the overall structure by 1 byte: struct foo { u16 field1; u8 field3; u32 field2; }; You can reorder to u32, u16, u8 order and save another byte :) A reference to pahole could be appropriate here, and probably a small note that some large existing structures like netdev have deliberate holes to achieve cache alignment. shameless plug: https://ols2006.108.redhat.com/2007/Reprints/melo-Reprint.pdf - Arnaldo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Nov 23, 2007, at 5:43 AM, Heikki Orsila wrote: On Fri, Nov 23, 2007 at 12:15:53AM +, Daniel Drake wrote: Why unaligned access is bad === Most architectures are unable to perform unaligned memory accesses. Any unaligned access causes a processor exception. Some architectures are unable to perform unaligned memory accesses, either an exception is generated, or the data access is silently invalid. In architectures that allow unaligned access, natural aligned accesses are usually faster than non-aligned. In summary: if your code causes unaligned memory accesses to happen, your code will not work on some platforms, and will perform *very* badly on others. *very* - *slower* Natural alignment = Please move this definition before Why unaligned access is bad. Also, it would be nice to have a table of ISAs: ISA NeedNeed natural alignment alignment by x m68kNo 2 powerpc/ppc Yes Word size on ppc it varies from processor to processor if misaligned data is fixed up or causes an exception. However its highly recommend to be naturally aligned. I'm not sure I follow what is meant by the second column (need alignment by x). - k - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
> mc68020+ No No > (mc68000/010 No 2) (not for Linux) Actually ucLinux has been persuaded to run on m68000. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
> Unaligned memory accesses occur when you try to read N bytes of data starting > from an address that is not evenly divisible by N (i.e. addr % N != 0). Should clarify that you mean "with power-of-two N" - even more strictly this depends on the processor, but I'm pretty sure there is none which supports aligned accesses of N==3... Olaf - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Sun, Nov 25, 2007 at 12:16:08PM +0100, Geert Uytterhoeven wrote: > > ISA NeedNeed > > natural alignment > > alignment by x > > > > m68kNo 2 > > `No' for >= 68020. > `Yes' for < 68020. My bad, yes.. mc68020+No No (mc68000/010No 2) (not for Linux) -- Heikki Orsila Barbie's law: [EMAIL PROTECTED] "Math is hard, let's go shopping!" http://www.iki.fi/shd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Fri, 23 Nov 2007, Heikki Orsila wrote: > On Fri, Nov 23, 2007 at 12:15:53AM +, Daniel Drake wrote: > > Why unaligned access is bad > > === > > > > Most architectures are unable to perform unaligned memory accesses. Any > > unaligned access causes a processor exception. > > "Some architectures are unable to perform unaligned memory accesses, > either an exception is generated, or the data > access is silently invalid. In architectures that allow unaligned > access, natural aligned accesses are usually faster than non-aligned." > > > In summary: if your code causes unaligned memory accesses to happen, your > > code > > will not work on some platforms, and will perform *very* badly on others. > > *very* -> *slower* > > > Natural alignment > > = > > Please move this definition before "Why unaligned access is bad". > > Also, it would be nice to have a table of ISAs: > > ISA NeedNeed > natural alignment > alignment by x > > m68k No 2 `No' for >= 68020. `Yes' for < 68020. > powerpc/ppc Yes Word size > x86 No No > x86_64No No Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED] In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Thursday 22 November 2007 16:15, Daniel Drake wrote: > In summary: if your code causes unaligned memory accesses to happen, your > code will not work on some platforms, and will perform *very* badly on > others. Although understanding alignment is important, there is another extreme - what I call "sadistic alignment". It's when data is being aligned even if it will definitely run on an arch which doesn't require this (arch/x86/*), or data being aligned to ridiculously large boundary. Like gcc aligning any char array bigger that 31 byte to 32 bytes. Bytes, not bits. Try to compile this with -O2: static char s1[] = "12345678901234567890123456789012"; static char s2[] = "12345678901234567890123456789012"; void f(char*); void g() { f(s1); f(s2); } $ hexdump -Cv t.o 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 |.ELF| 0010 01 00 03 00 01 00 00 00 00 00 00 00 00 00 00 00 || 0020 38 01 00 00 00 00 00 00 34 00 00 00 00 00 28 00 |8...4.(.| 0030 0a 00 07 00 55 89 e5 83 ec 08 c7 04 24 40 00 00 |[EMAIL PROTECTED]| 0040 00 e8 fc ff ff ff c7 04 24 00 00 00 00 e8 fc ff |$...| 0050 ff ff c9 c3 00 00 00 00 00 00 00 00 00 00 00 00 || <=== HERE 0060 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 |1234567890123456| 0070 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 |7890123456789012| 0080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 || <=== HERE 0090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 || <=== HERE 00a0 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 |1234567890123456| 00b0 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 |7890123456789012| 00c0 00 00 00 00 00 47 43 43 3a 20 28 47 4e 55 29 20 |.GCC: (GNU) | 00d0 34 2e 30 2e 33 20 28 55 62 75 6e 74 75 20 34 2e |4.0.3 (Ubuntu 4.| 00e0 30 2e 33 2d 31 75 62 75 6e 74 75 35 29 00 00 2e |0.3-1ubuntu5)...| 00f0 73 79 6d 74 61 62 00 2e 73 74 72 74 61 62 00 2e |symtab..strtab..| 43 bytes wasted! Thankfully, it is fixed in later gcc versions. Please do not succumb to "alignment scare" in your doc. -- vda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Thursday 22 November 2007 16:15, Daniel Drake wrote: In summary: if your code causes unaligned memory accesses to happen, your code will not work on some platforms, and will perform *very* badly on others. Although understanding alignment is important, there is another extreme - what I call sadistic alignment. It's when data is being aligned even if it will definitely run on an arch which doesn't require this (arch/x86/*), or data being aligned to ridiculously large boundary. Like gcc aligning any char array bigger that 31 byte to 32 bytes. Bytes, not bits. Try to compile this with -O2: static char s1[] = 12345678901234567890123456789012; static char s2[] = 12345678901234567890123456789012; void f(char*); void g() { f(s1); f(s2); } $ hexdump -Cv t.o 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 |.ELF| 0010 01 00 03 00 01 00 00 00 00 00 00 00 00 00 00 00 || 0020 38 01 00 00 00 00 00 00 34 00 00 00 00 00 28 00 |8...4.(.| 0030 0a 00 07 00 55 89 e5 83 ec 08 c7 04 24 40 00 00 |[EMAIL PROTECTED]| 0040 00 e8 fc ff ff ff c7 04 24 00 00 00 00 e8 fc ff |$...| 0050 ff ff c9 c3 00 00 00 00 00 00 00 00 00 00 00 00 || === HERE 0060 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 |1234567890123456| 0070 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 |7890123456789012| 0080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 || === HERE 0090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 || === HERE 00a0 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 |1234567890123456| 00b0 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 |7890123456789012| 00c0 00 00 00 00 00 47 43 43 3a 20 28 47 4e 55 29 20 |.GCC: (GNU) | 00d0 34 2e 30 2e 33 20 28 55 62 75 6e 74 75 20 34 2e |4.0.3 (Ubuntu 4.| 00e0 30 2e 33 2d 31 75 62 75 6e 74 75 35 29 00 00 2e |0.3-1ubuntu5)...| 00f0 73 79 6d 74 61 62 00 2e 73 74 72 74 61 62 00 2e |symtab..strtab..| 43 bytes wasted! Thankfully, it is fixed in later gcc versions. Please do not succumb to alignment scare in your doc. -- vda - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Fri, 23 Nov 2007, Heikki Orsila wrote: On Fri, Nov 23, 2007 at 12:15:53AM +, Daniel Drake wrote: Why unaligned access is bad === Most architectures are unable to perform unaligned memory accesses. Any unaligned access causes a processor exception. Some architectures are unable to perform unaligned memory accesses, either an exception is generated, or the data access is silently invalid. In architectures that allow unaligned access, natural aligned accesses are usually faster than non-aligned. In summary: if your code causes unaligned memory accesses to happen, your code will not work on some platforms, and will perform *very* badly on others. *very* - *slower* Natural alignment = Please move this definition before Why unaligned access is bad. Also, it would be nice to have a table of ISAs: ISA NeedNeed natural alignment alignment by x m68k No 2 `No' for = 68020. `Yes' for 68020. powerpc/ppc Yes Word size x86 No No x86_64No No Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED] In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say programmer or something like that. -- Linus Torvalds - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Sun, Nov 25, 2007 at 12:16:08PM +0100, Geert Uytterhoeven wrote: ISA NeedNeed natural alignment alignment by x m68kNo 2 `No' for = 68020. `Yes' for 68020. My bad, yes.. mc68020+No No (mc68000/010No 2) (not for Linux) -- Heikki Orsila Barbie's law: [EMAIL PROTECTED] Math is hard, let's go shopping! http://www.iki.fi/shd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
Unaligned memory accesses occur when you try to read N bytes of data starting from an address that is not evenly divisible by N (i.e. addr % N != 0). Should clarify that you mean with power-of-two N - even more strictly this depends on the processor, but I'm pretty sure there is none which supports aligned accesses of N==3... Olaf - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
mc68020+ No No (mc68000/010 No 2) (not for Linux) Actually ucLinux has been persuaded to run on m68000. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Sat, Nov 24, 2007 at 06:35:25PM +0100, Pierre Ossman wrote: > On Sat, 24 Nov 2007 17:22:36 + > Luciano Rocha <[EMAIL PROTECTED]> wrote: > > > On Sat, Nov 24, 2007 at 05:19:31PM +0100, Pierre Ossman wrote: > > > It most certainly does not. gcc will assume that an int* has int > > > alignment. memcpy() is a builtin, which gcc can translate to pretty much > > > anything. And C specifies that a pointer to foo, will point to a real > > > object of type foo, so gcc can't be blamed for the unsafe typecasts. I > > > have tested this the hard way, so this is not just speculation. > > > > Yes, on *int and other assumed aligned pointers, gcc uses its internal > > version. > > > > However, my point is that those pointers, unless speaking of packed > > structures, can safely be assumed aligned, while char*/void* can't. > > > > I get the sensation we're violently in agreement here, just misunderstanding > each other. :) That's it. :) Sorry for the noise,... -- lfr 0/0 pgprb39HuMXhL.pgp Description: PGP signature
Re: [RFC] Documentation about unaligned memory access
On Sat, 24 Nov 2007 17:22:36 + Luciano Rocha <[EMAIL PROTECTED]> wrote: > Nothing does, even memcpy doesn't check alignment of the source, or > alignment at all in some assembly implementations (only word-copy, > without checking if at word-boundary). An out-of-line implementation can only do that if the architecture allows unaligned loads and stores. Since it has no clue about the types involved, it must assume that both pointers as well as the length may be misaligned. gcc, on the other hand, knows exactly what types are involved, so when it expands its own builtin-memcpy inline it can optimize it based on the required alignment of those types. So when you cast between types with different alignment requirements, you must make sure the result is properly aligned, or you need to use get_unaligned()/put_unaligned() to override gcc's assumptions. Btw, some versions of avr32-gcc (I think it was 4.0.x) assumed packed structs were properly aligned too, with disastrous results. gcc-4.1 handles packed structs correctly as far as I can tell. Håvard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Sat, 24 Nov 2007 17:22:36 + Luciano Rocha <[EMAIL PROTECTED]> wrote: > On Sat, Nov 24, 2007 at 05:19:31PM +0100, Pierre Ossman wrote: > > It most certainly does not. gcc will assume that an int* has int alignment. > > memcpy() is a builtin, which gcc can translate to pretty much anything. And > > C specifies that a pointer to foo, will point to a real object of type foo, > > so gcc can't be blamed for the unsafe typecasts. I have tested this the > > hard way, so this is not just speculation. > > Yes, on *int and other assumed aligned pointers, gcc uses its internal > version. > > However, my point is that those pointers, unless speaking of packed > structures, can safely be assumed aligned, while char*/void* can't. > I get the sensation we're violently in agreement here, just misunderstanding each other. :) _My_ point was that the documentation should mention that normal, unpacked C objects have alignments that influence the code generated by __builtin_memcpy(). As such, one should always make sure to have either src or dst be char*/void* when alignment cannot be guaranteed. The example in the documentation has this, but it isn't explicit that this is required. Rgds -- -- Pierre Ossman Linux kernel, MMC maintainerhttp://www.kernel.org PulseAudio, core developer http://pulseaudio.org rdesktop, core developer http://www.rdesktop.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Sat, Nov 24, 2007 at 05:19:31PM +0100, Pierre Ossman wrote: > On Sat, 24 Nov 2007 15:50:52 + > Luciano Rocha <[EMAIL PROTECTED]> wrote: > > > > > Dumb memcpy (while (len--) { *d++ = *s++ }) will have alignment problems > > in any case. Intelligent ones, like the one provided in glibc, first copy > > bytes till output is aligned (C file) *or* size is a multiple (i686 asm > > file) > > of word size, and then it copies word-by-word. > > > > Linux's x86_64 memcpy does the opposite, copies 64bit words, and then > > copies the last bytes. > > > > So, in effect, as long as no packed structures are used, memcpy should > > be safer on *int, etc., than *char, as the compiler ensures > > word-alignment. > > > > It most certainly does not. gcc will assume that an int* has int alignment. > memcpy() is a builtin, which gcc can translate to pretty much anything. And C > specifies that a pointer to foo, will point to a real object of type foo, so > gcc can't be blamed for the unsafe typecasts. I have tested this the hard > way, so this is not just speculation. Yes, on *int and other assumed aligned pointers, gcc uses its internal version. However, my point is that those pointers, unless speaking of packed structures, can safely be assumed aligned, while char*/void* can't. > In other words, memcpy() does _not_ save you from alignment issues. If you > cast from char* or void* to something else, you better be damn sure the > alignment is correct because gcc will assume it is. Nothing does, even memcpy doesn't check alignment of the source, or alignment at all in some assembly implementations (only word-copy, without checking if at word-boundary). -- lfr 0/0 pgpSqyJvQFOo9.pgp Description: PGP signature
Re: [RFC] Documentation about unaligned memory access
On Sat, 24 Nov 2007 15:50:52 + Luciano Rocha <[EMAIL PROTECTED]> wrote: > > Dumb memcpy (while (len--) { *d++ = *s++ }) will have alignment problems > in any case. Intelligent ones, like the one provided in glibc, first copy > bytes till output is aligned (C file) *or* size is a multiple (i686 asm file) > of word size, and then it copies word-by-word. > > Linux's x86_64 memcpy does the opposite, copies 64bit words, and then > copies the last bytes. > > So, in effect, as long as no packed structures are used, memcpy should > be safer on *int, etc., than *char, as the compiler ensures > word-alignment. > It most certainly does not. gcc will assume that an int* has int alignment. memcpy() is a builtin, which gcc can translate to pretty much anything. And C specifies that a pointer to foo, will point to a real object of type foo, so gcc can't be blamed for the unsafe typecasts. I have tested this the hard way, so this is not just speculation. E.g., we have the following struct: struct foo { u8 a[4]; u32 b; }; This struct will have a size of 8 bytes and an alignment of 4 bytes (caused by the member b). Now take the following code: void copy_foo(struct foo *dst, struct foo *src) { *dst = *src; } On a platform that supports 64-bit loads and stores (e.g. AVR32, where I got hit by this), this will generate: LD r1, (src) ST r1, (dst) Now if I replace that with: void copy_foo(struct foo *dst, struct foo *src) { memcpy(dst, src, sizeof(struct foo)); } then it will generate the same code. So I cannot use copy_foo() to transfer a struct foo either out of, or into a packet buffer. In other words, memcpy() does _not_ save you from alignment issues. If you cast from char* or void* to something else, you better be damn sure the alignment is correct because gcc will assume it is. Rgds -- -- Pierre Ossman Linux kernel, MMC maintainerhttp://www.kernel.org PulseAudio, core developer http://pulseaudio.org rdesktop, core developer http://www.rdesktop.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Sat, Nov 24, 2007 at 02:34:41PM +0100, Pierre Ossman wrote: > On Fri, 23 Nov 2007 00:15:53 + (GMT) > Daniel Drake <[EMAIL PROTECTED]> wrote: > > > Being spoilt by the luxuries of i386/x86_64 I've never really had a good > > grasp on unaligned memory access problems on other architectures and decided > > it was time to figure it out. As a result I've written this documentation > > which I plan to submit for inclusion as > > Documentation/unaligned_memory_access.txt > > > > Before I do so, any comments on the following? > > > > A very nice, and much needed document. I think you should include one thing > though: > > memcpy() is _only_ safe when one of the pointers is char* or void*. If it is > anything more complex than that, gcc will assume alignment and optimise based > on that. E.g. memcpy() of two long:s generates the same assembly as doing an > assignment. Dumb memcpy (while (len--) { *d++ = *s++ }) will have alignment problems in any case. Intelligent ones, like the one provided in glibc, first copy bytes till output is aligned (C file) *or* size is a multiple (i686 asm file) of word size, and then it copies word-by-word. Linux's x86_64 memcpy does the opposite, copies 64bit words, and then copies the last bytes. So, in effect, as long as no packed structures are used, memcpy should be safer on *int, etc., than *char, as the compiler ensures word-alignment. -- lfr 0/0 pgpQa3znDcMST.pgp Description: PGP signature
Re: [RFC] Documentation about unaligned memory access
On Fri, 23 Nov 2007 00:15:53 + (GMT) Daniel Drake <[EMAIL PROTECTED]> wrote: > Being spoilt by the luxuries of i386/x86_64 I've never really had a good > grasp on unaligned memory access problems on other architectures and decided > it was time to figure it out. As a result I've written this documentation > which I plan to submit for inclusion as > Documentation/unaligned_memory_access.txt > > Before I do so, any comments on the following? > A very nice, and much needed document. I think you should include one thing though: memcpy() is _only_ safe when one of the pointers is char* or void*. If it is anything more complex than that, gcc will assume alignment and optimise based on that. E.g. memcpy() of two long:s generates the same assembly as doing an assignment. (Technically it is no different for char* and void*, but since they have byte alignment, gcc can't really do anything creative.) Rgds -- -- Pierre Ossman Linux kernel, MMC maintainerhttp://www.kernel.org PulseAudio, core developer http://pulseaudio.org rdesktop, core developer http://www.rdesktop.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Fri, 23 Nov 2007 00:15:53 + (GMT) Daniel Drake [EMAIL PROTECTED] wrote: Being spoilt by the luxuries of i386/x86_64 I've never really had a good grasp on unaligned memory access problems on other architectures and decided it was time to figure it out. As a result I've written this documentation which I plan to submit for inclusion as Documentation/unaligned_memory_access.txt Before I do so, any comments on the following? A very nice, and much needed document. I think you should include one thing though: memcpy() is _only_ safe when one of the pointers is char* or void*. If it is anything more complex than that, gcc will assume alignment and optimise based on that. E.g. memcpy() of two long:s generates the same assembly as doing an assignment. (Technically it is no different for char* and void*, but since they have byte alignment, gcc can't really do anything creative.) Rgds -- -- Pierre Ossman Linux kernel, MMC maintainerhttp://www.kernel.org PulseAudio, core developer http://pulseaudio.org rdesktop, core developer http://www.rdesktop.org - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Sat, Nov 24, 2007 at 02:34:41PM +0100, Pierre Ossman wrote: On Fri, 23 Nov 2007 00:15:53 + (GMT) Daniel Drake [EMAIL PROTECTED] wrote: Being spoilt by the luxuries of i386/x86_64 I've never really had a good grasp on unaligned memory access problems on other architectures and decided it was time to figure it out. As a result I've written this documentation which I plan to submit for inclusion as Documentation/unaligned_memory_access.txt Before I do so, any comments on the following? A very nice, and much needed document. I think you should include one thing though: memcpy() is _only_ safe when one of the pointers is char* or void*. If it is anything more complex than that, gcc will assume alignment and optimise based on that. E.g. memcpy() of two long:s generates the same assembly as doing an assignment. Dumb memcpy (while (len--) { *d++ = *s++ }) will have alignment problems in any case. Intelligent ones, like the one provided in glibc, first copy bytes till output is aligned (C file) *or* size is a multiple (i686 asm file) of word size, and then it copies word-by-word. Linux's x86_64 memcpy does the opposite, copies 64bit words, and then copies the last bytes. So, in effect, as long as no packed structures are used, memcpy should be safer on *int, etc., than *char, as the compiler ensures word-alignment. -- lfr 0/0 pgpQa3znDcMST.pgp Description: PGP signature
Re: [RFC] Documentation about unaligned memory access
On Sat, 24 Nov 2007 15:50:52 + Luciano Rocha [EMAIL PROTECTED] wrote: Dumb memcpy (while (len--) { *d++ = *s++ }) will have alignment problems in any case. Intelligent ones, like the one provided in glibc, first copy bytes till output is aligned (C file) *or* size is a multiple (i686 asm file) of word size, and then it copies word-by-word. Linux's x86_64 memcpy does the opposite, copies 64bit words, and then copies the last bytes. So, in effect, as long as no packed structures are used, memcpy should be safer on *int, etc., than *char, as the compiler ensures word-alignment. It most certainly does not. gcc will assume that an int* has int alignment. memcpy() is a builtin, which gcc can translate to pretty much anything. And C specifies that a pointer to foo, will point to a real object of type foo, so gcc can't be blamed for the unsafe typecasts. I have tested this the hard way, so this is not just speculation. E.g., we have the following struct: struct foo { u8 a[4]; u32 b; }; This struct will have a size of 8 bytes and an alignment of 4 bytes (caused by the member b). Now take the following code: void copy_foo(struct foo *dst, struct foo *src) { *dst = *src; } On a platform that supports 64-bit loads and stores (e.g. AVR32, where I got hit by this), this will generate: LD r1, (src) ST r1, (dst) Now if I replace that with: void copy_foo(struct foo *dst, struct foo *src) { memcpy(dst, src, sizeof(struct foo)); } then it will generate the same code. So I cannot use copy_foo() to transfer a struct foo either out of, or into a packet buffer. In other words, memcpy() does _not_ save you from alignment issues. If you cast from char* or void* to something else, you better be damn sure the alignment is correct because gcc will assume it is. Rgds -- -- Pierre Ossman Linux kernel, MMC maintainerhttp://www.kernel.org PulseAudio, core developer http://pulseaudio.org rdesktop, core developer http://www.rdesktop.org - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Sat, Nov 24, 2007 at 05:19:31PM +0100, Pierre Ossman wrote: On Sat, 24 Nov 2007 15:50:52 + Luciano Rocha [EMAIL PROTECTED] wrote: Dumb memcpy (while (len--) { *d++ = *s++ }) will have alignment problems in any case. Intelligent ones, like the one provided in glibc, first copy bytes till output is aligned (C file) *or* size is a multiple (i686 asm file) of word size, and then it copies word-by-word. Linux's x86_64 memcpy does the opposite, copies 64bit words, and then copies the last bytes. So, in effect, as long as no packed structures are used, memcpy should be safer on *int, etc., than *char, as the compiler ensures word-alignment. It most certainly does not. gcc will assume that an int* has int alignment. memcpy() is a builtin, which gcc can translate to pretty much anything. And C specifies that a pointer to foo, will point to a real object of type foo, so gcc can't be blamed for the unsafe typecasts. I have tested this the hard way, so this is not just speculation. Yes, on *int and other assumed aligned pointers, gcc uses its internal version. However, my point is that those pointers, unless speaking of packed structures, can safely be assumed aligned, while char*/void* can't. In other words, memcpy() does _not_ save you from alignment issues. If you cast from char* or void* to something else, you better be damn sure the alignment is correct because gcc will assume it is. Nothing does, even memcpy doesn't check alignment of the source, or alignment at all in some assembly implementations (only word-copy, without checking if at word-boundary). -- lfr 0/0 pgpSqyJvQFOo9.pgp Description: PGP signature
Re: [RFC] Documentation about unaligned memory access
On Sat, 24 Nov 2007 17:22:36 + Luciano Rocha [EMAIL PROTECTED] wrote: On Sat, Nov 24, 2007 at 05:19:31PM +0100, Pierre Ossman wrote: It most certainly does not. gcc will assume that an int* has int alignment. memcpy() is a builtin, which gcc can translate to pretty much anything. And C specifies that a pointer to foo, will point to a real object of type foo, so gcc can't be blamed for the unsafe typecasts. I have tested this the hard way, so this is not just speculation. Yes, on *int and other assumed aligned pointers, gcc uses its internal version. However, my point is that those pointers, unless speaking of packed structures, can safely be assumed aligned, while char*/void* can't. I get the sensation we're violently in agreement here, just misunderstanding each other. :) _My_ point was that the documentation should mention that normal, unpacked C objects have alignments that influence the code generated by __builtin_memcpy(). As such, one should always make sure to have either src or dst be char*/void* when alignment cannot be guaranteed. The example in the documentation has this, but it isn't explicit that this is required. Rgds -- -- Pierre Ossman Linux kernel, MMC maintainerhttp://www.kernel.org PulseAudio, core developer http://pulseaudio.org rdesktop, core developer http://www.rdesktop.org - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Sat, 24 Nov 2007 17:22:36 + Luciano Rocha [EMAIL PROTECTED] wrote: Nothing does, even memcpy doesn't check alignment of the source, or alignment at all in some assembly implementations (only word-copy, without checking if at word-boundary). An out-of-line implementation can only do that if the architecture allows unaligned loads and stores. Since it has no clue about the types involved, it must assume that both pointers as well as the length may be misaligned. gcc, on the other hand, knows exactly what types are involved, so when it expands its own builtin-memcpy inline it can optimize it based on the required alignment of those types. So when you cast between types with different alignment requirements, you must make sure the result is properly aligned, or you need to use get_unaligned()/put_unaligned() to override gcc's assumptions. Btw, some versions of avr32-gcc (I think it was 4.0.x) assumed packed structs were properly aligned too, with disastrous results. gcc-4.1 handles packed structs correctly as far as I can tell. Håvard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Sat, Nov 24, 2007 at 06:35:25PM +0100, Pierre Ossman wrote: On Sat, 24 Nov 2007 17:22:36 + Luciano Rocha [EMAIL PROTECTED] wrote: On Sat, Nov 24, 2007 at 05:19:31PM +0100, Pierre Ossman wrote: It most certainly does not. gcc will assume that an int* has int alignment. memcpy() is a builtin, which gcc can translate to pretty much anything. And C specifies that a pointer to foo, will point to a real object of type foo, so gcc can't be blamed for the unsafe typecasts. I have tested this the hard way, so this is not just speculation. Yes, on *int and other assumed aligned pointers, gcc uses its internal version. However, my point is that those pointers, unless speaking of packed structures, can safely be assumed aligned, while char*/void* can't. I get the sensation we're violently in agreement here, just misunderstanding each other. :) That's it. :) Sorry for the noise,... -- lfr 0/0 pgprb39HuMXhL.pgp Description: PGP signature
Re: [RFC] Documentation about unaligned memory access
Daniel Drake пишет: > Being spoilt by the luxuries of i386/x86_64 I've never really had a good > grasp on unaligned memory access problems on other architectures and decided > it was time to figure it out. As a result I've written this documentation > which I plan to submit for inclusion as > Documentation/unaligned_memory_access.txt > > Before I do so, any comments on the following? >From the viewpoint of yours truly (and I am a teacher of operating system >classes), this is a long-expected document, which is going to be very useful >especially for newbies. My students often make alignment mistakes in their >code, and your article will definitely make my job much easier. Thank you, Daniel, for your work. Dmitri > > Thanks, > Daniel > > > > > UNALIGNED MEMORY ACCESSES > = > > Linux runs on a wide variety of architectures which have varying behaviour > when it comes to memory access. This document presents some details about > unaligned accesses, why you need to write code that doesn't cause them, > and how to write such code! > > > What's the definition of an unaligned access? > = > > Unaligned memory accesses occur when you try to read N bytes of data starting > from an address that is not evenly divisible by N (i.e. addr % N != 0). > For example, reading 4 bytes of data from address 0x1004 is fine, but > reading 4 bytes of data from address 0x1005 would be an unaligned memory > access. > > > Why unaligned access is bad > === > > Most architectures are unable to perform unaligned memory accesses. Any > unaligned access causes a processor exception. > > Some architectures have an exception handler implemented in the kernel which > corrects the memory access, but this is very expensive and is not true for > all architectures. You cannot rely on the exception handler to correct your > memory accesses. > > In summary: if your code causes unaligned memory accesses to happen, your code > will not work on some platforms, and will perform *very* badly on others. > > You may be wondering why you have never seen these problems on your own > architecture. Some architectures (such as i386 and x86_64) do not have this > limitation, but nevertheless it is important for you to write portable code > that works everywhere. > > > Natural alignment > = > > The rule we mentioned earlier forms what we refer to as natural alignment: > When accessing N bytes of memory, the base memory address must be evenly > divisible by N, i.e. addr % N == 0 > > When writing code, assume the target architecture has natural alignment > requirements. > > Sidenote: in reality, only a few architectures require natural alignment > on all sizes of memory access. However, again we must consider ALL supported > architectures; natural alignment is the only way to achieve full portability. > > > Code that doesn't cause unaligned access > > > At first, the concepts above may seem a little hard to relate to actual > coding practice. After all, you don't have a great deal of control over > memory addresses of certain variables, etc. > > Fortunately things are not too complex, as in most cases, the compiler > ensures that things will work for you. For example, take the following > structure: > > struct foo { > u16 field1; > u32 field2; > u8 field3; > }; > > Let us assume that an instance of the above structure resides in memory > starting at address 0x1000. With a basic level of understanding, it would > not be unreasonable to expect that accessing field2 would cause an unaligned > access. You'd be expecting field2 to be located at offset 2 bytes into the > structure, i.e. address 0x1002, but that address is not evenly divisible > by 4 (remember, we're reading a 4 byte value here). > > Fortunately, the compiler understands the alignment constraints, so in the > above case it would insert 2 bytes of padding inbetween field1 and field2. > Therefore, for standard structure types you can always rely on the compiler > to pad structures so that accesses to fields are suitably aligned (assuming > you do not cast the field to a type of different length). > > Similarly, you can also rely on the compiler to align variables and function > parameters to a naturally aligned scheme, based on the size of the type of > the variable. > > Sidenote: in the above example, you may wish to reorder the fields in the > above structure so that the overall structure uses less memory. For example, > moving field3 to sit inbetween field1 and field2 (where the padding is > inserted) would shrink the overall structure by 1 byte: > > struct foo { > u16 field1; > u8 field3; > u32 field2; > }; > > Sidenote: it should be obvious by now, but in case it is not, accessing a > single
Re: [RFC] Documentation about unaligned memory access
On Thursday 22 November 2007 04:15:53 pm Daniel Drake wrote: > Fortunately things are not too complex, as in most cases, the compiler > ensures that things will work for you. For example, take the following > structure: > > struct foo { > u16 field1; > u32 field2; > u8 field3; > }; > > Fortunately, the compiler understands the alignment constraints, so in the > above case it would insert 2 bytes of padding inbetween field1 and field2. > Therefore, for standard structure types you can always rely on the compiler > to pad structures so that accesses to fields are suitably aligned (assuming > you do not cast the field to a type of different length). It would also insert 3 bytes of padding after field3, in order to satisfy alignment constraints for arrays of these structures. > Sidenote: in the above example, you may wish to reorder the fields in the > above structure so that the overall structure uses less memory. For > example, moving field3 to sit inbetween field1 and field2 (where the > padding is inserted) would shrink the overall structure by 1 byte: > > struct foo { > u16 field1; > u8 field3; > u32 field2; > }; It will actually shrink it by 4 bytes, for the very same reason. -- Vadim Lobanov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Fri, Nov 23, 2007 at 12:15:53AM +, Daniel Drake wrote: > Why unaligned access is bad > === > > Most architectures are unable to perform unaligned memory accesses. Any > unaligned access causes a processor exception. "Some architectures are unable to perform unaligned memory accesses, either an exception is generated, or the data access is silently invalid. In architectures that allow unaligned access, natural aligned accesses are usually faster than non-aligned." > In summary: if your code causes unaligned memory accesses to happen, your code > will not work on some platforms, and will perform *very* badly on others. *very* -> *slower* > Natural alignment > = Please move this definition before "Why unaligned access is bad". Also, it would be nice to have a table of ISAs: ISA NeedNeed natural alignment alignment by x m68kNo 2 powerpc/ppc Yes Word size x86 No No x86_64 No No -- Heikki Orsila Barbie's law: [EMAIL PROTECTED] "Math is hard, let's go shopping!" http://www.iki.fi/shd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Nov 23 2007 00:15, Daniel Drake wrote: > >What's the definition of an unaligned access? >= > >Unaligned memory accesses occur when you try to read N bytes of data starting >from an address that is not evenly divisible by N (i.e. addr % N != 0). >For example, reading 4 bytes of data from address 0x1004 is fine, but >reading 4 bytes of data from address 0x1005 would be an unaligned memory >access. > Try shorter numbers, like 0x10005 :) >Code that doesn't cause unaligned access > In written style, not using n't contracted forms might be preferable. >Sidenote: in the above example, you may wish to reorder the fields in the >above structure so that the overall structure uses less memory. For example, >moving field3 to sit inbetween field1 and field2 (where the padding is >inserted) would shrink the overall structure by 1 byte: > > struct foo { > u16 field1; > u8 field3; > u32 field2; > }; > >Sidenote: it should be obvious by now, but in case it is not, accessing a >single byte (u8 or char) can never cause an unaligned access, because all >memory addresses are evenly divisible by 1. Sidenote: You would want an alignment like this: struct foo { uint32_t field2; uint16_t field1; uint8_t field3; }; >Consider the following structure: > struct foo { > u16 field1; > u32 field2; > u8 field3; > } __attribute__((packed)); > >It's the same structure as we looked at earlier, but the packed attribute has >been added. This attribute ensures that the compiler never inserts any padding >and the structure is laid out in memory exactly as is suggested above. > >The packed attribute is useful when you want to use a C struct to represent >some data that comes in a fixed arrangement 'off the wire'. > In the packed case, does not GCC automatically output extra instructions to not run into unaligned access? >To avoid the unaligned memory access, you could rewrite it as follows: > > void myfunc(u8 *data, u32 value) > { > [...] > value = cpu_to_le32(value); > memcpy(data, value, sizeof(value)); > [...] > } > >It's safe to assume that memcpy will always copy bytewise and hence will >never cause an unaligned access. > Usually it copies register-size-wise where possible and bytesize at the left and right edges if they are unaligned. That's how glibc memcpy does it, not sure how complete the kernel memcpy is in this regard. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
dean gaudet <[EMAIL PROTECTED]> writes: > on AMD x86 pre-family 10h the boundary is 8 bytes, and on fam 10h it's 16 > bytes. the penalty is a mere 3 cycles if an access crosses the specified > boundary. Worth noting though, is that atomic accesses that cross cache lines on an Opteron system is going to lock down the Hypertransport fabric for you during the operation -- which is obviously not so nice. -- Arne. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Thursday 22 November 2007 04:15:53 pm Daniel Drake wrote: Fortunately things are not too complex, as in most cases, the compiler ensures that things will work for you. For example, take the following structure: struct foo { u16 field1; u32 field2; u8 field3; }; Fortunately, the compiler understands the alignment constraints, so in the above case it would insert 2 bytes of padding inbetween field1 and field2. Therefore, for standard structure types you can always rely on the compiler to pad structures so that accesses to fields are suitably aligned (assuming you do not cast the field to a type of different length). It would also insert 3 bytes of padding after field3, in order to satisfy alignment constraints for arrays of these structures. Sidenote: in the above example, you may wish to reorder the fields in the above structure so that the overall structure uses less memory. For example, moving field3 to sit inbetween field1 and field2 (where the padding is inserted) would shrink the overall structure by 1 byte: struct foo { u16 field1; u8 field3; u32 field2; }; It will actually shrink it by 4 bytes, for the very same reason. -- Vadim Lobanov - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
Daniel Drake пишет: Being spoilt by the luxuries of i386/x86_64 I've never really had a good grasp on unaligned memory access problems on other architectures and decided it was time to figure it out. As a result I've written this documentation which I plan to submit for inclusion as Documentation/unaligned_memory_access.txt Before I do so, any comments on the following? From the viewpoint of yours truly (and I am a teacher of operating system classes), this is a long-expected document, which is going to be very useful especially for newbies. My students often make alignment mistakes in their code, and your article will definitely make my job much easier. Thank you, Daniel, for your work. Dmitri Thanks, Daniel UNALIGNED MEMORY ACCESSES = Linux runs on a wide variety of architectures which have varying behaviour when it comes to memory access. This document presents some details about unaligned accesses, why you need to write code that doesn't cause them, and how to write such code! What's the definition of an unaligned access? = Unaligned memory accesses occur when you try to read N bytes of data starting from an address that is not evenly divisible by N (i.e. addr % N != 0). For example, reading 4 bytes of data from address 0x1004 is fine, but reading 4 bytes of data from address 0x1005 would be an unaligned memory access. Why unaligned access is bad === Most architectures are unable to perform unaligned memory accesses. Any unaligned access causes a processor exception. Some architectures have an exception handler implemented in the kernel which corrects the memory access, but this is very expensive and is not true for all architectures. You cannot rely on the exception handler to correct your memory accesses. In summary: if your code causes unaligned memory accesses to happen, your code will not work on some platforms, and will perform *very* badly on others. You may be wondering why you have never seen these problems on your own architecture. Some architectures (such as i386 and x86_64) do not have this limitation, but nevertheless it is important for you to write portable code that works everywhere. Natural alignment = The rule we mentioned earlier forms what we refer to as natural alignment: When accessing N bytes of memory, the base memory address must be evenly divisible by N, i.e. addr % N == 0 When writing code, assume the target architecture has natural alignment requirements. Sidenote: in reality, only a few architectures require natural alignment on all sizes of memory access. However, again we must consider ALL supported architectures; natural alignment is the only way to achieve full portability. Code that doesn't cause unaligned access At first, the concepts above may seem a little hard to relate to actual coding practice. After all, you don't have a great deal of control over memory addresses of certain variables, etc. Fortunately things are not too complex, as in most cases, the compiler ensures that things will work for you. For example, take the following structure: struct foo { u16 field1; u32 field2; u8 field3; }; Let us assume that an instance of the above structure resides in memory starting at address 0x1000. With a basic level of understanding, it would not be unreasonable to expect that accessing field2 would cause an unaligned access. You'd be expecting field2 to be located at offset 2 bytes into the structure, i.e. address 0x1002, but that address is not evenly divisible by 4 (remember, we're reading a 4 byte value here). Fortunately, the compiler understands the alignment constraints, so in the above case it would insert 2 bytes of padding inbetween field1 and field2. Therefore, for standard structure types you can always rely on the compiler to pad structures so that accesses to fields are suitably aligned (assuming you do not cast the field to a type of different length). Similarly, you can also rely on the compiler to align variables and function parameters to a naturally aligned scheme, based on the size of the type of the variable. Sidenote: in the above example, you may wish to reorder the fields in the above structure so that the overall structure uses less memory. For example, moving field3 to sit inbetween field1 and field2 (where the padding is inserted) would shrink the overall structure by 1 byte: struct foo { u16 field1; u8 field3; u32 field2; }; Sidenote: it should be obvious by now, but in case it is not, accessing a single byte (u8 or char) can never cause an unaligned access, because all memory addresses are evenly divisible by 1. Code
Re: [RFC] Documentation about unaligned memory access
On Nov 23 2007 00:15, Daniel Drake wrote: What's the definition of an unaligned access? = Unaligned memory accesses occur when you try to read N bytes of data starting from an address that is not evenly divisible by N (i.e. addr % N != 0). For example, reading 4 bytes of data from address 0x1004 is fine, but reading 4 bytes of data from address 0x1005 would be an unaligned memory access. Try shorter numbers, like 0x10005 :) Code that doesn't cause unaligned access In written style, not using n't contracted forms might be preferable. Sidenote: in the above example, you may wish to reorder the fields in the above structure so that the overall structure uses less memory. For example, moving field3 to sit inbetween field1 and field2 (where the padding is inserted) would shrink the overall structure by 1 byte: struct foo { u16 field1; u8 field3; u32 field2; }; Sidenote: it should be obvious by now, but in case it is not, accessing a single byte (u8 or char) can never cause an unaligned access, because all memory addresses are evenly divisible by 1. Sidenote: You would want an alignment like this: struct foo { uint32_t field2; uint16_t field1; uint8_t field3; }; Consider the following structure: struct foo { u16 field1; u32 field2; u8 field3; } __attribute__((packed)); It's the same structure as we looked at earlier, but the packed attribute has been added. This attribute ensures that the compiler never inserts any padding and the structure is laid out in memory exactly as is suggested above. The packed attribute is useful when you want to use a C struct to represent some data that comes in a fixed arrangement 'off the wire'. In the packed case, does not GCC automatically output extra instructions to not run into unaligned access? To avoid the unaligned memory access, you could rewrite it as follows: void myfunc(u8 *data, u32 value) { [...] value = cpu_to_le32(value); memcpy(data, value, sizeof(value)); [...] } It's safe to assume that memcpy will always copy bytewise and hence will never cause an unaligned access. Usually it copies register-size-wise where possible and bytesize at the left and right edges if they are unaligned. That's how glibc memcpy does it, not sure how complete the kernel memcpy is in this regard. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
dean gaudet [EMAIL PROTECTED] writes: on AMD x86 pre-family 10h the boundary is 8 bytes, and on fam 10h it's 16 bytes. the penalty is a mere 3 cycles if an access crosses the specified boundary. Worth noting though, is that atomic accesses that cross cache lines on an Opteron system is going to lock down the Hypertransport fabric for you during the operation -- which is obviously not so nice. -- Arne. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Fri, Nov 23, 2007 at 12:15:53AM +, Daniel Drake wrote: Why unaligned access is bad === Most architectures are unable to perform unaligned memory accesses. Any unaligned access causes a processor exception. Some architectures are unable to perform unaligned memory accesses, either an exception is generated, or the data access is silently invalid. In architectures that allow unaligned access, natural aligned accesses are usually faster than non-aligned. In summary: if your code causes unaligned memory accesses to happen, your code will not work on some platforms, and will perform *very* badly on others. *very* - *slower* Natural alignment = Please move this definition before Why unaligned access is bad. Also, it would be nice to have a table of ISAs: ISA NeedNeed natural alignment alignment by x m68kNo 2 powerpc/ppc Yes Word size x86 No No x86_64 No No -- Heikki Orsila Barbie's law: [EMAIL PROTECTED] Math is hard, let's go shopping! http://www.iki.fi/shd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Fri, 23 Nov 2007, Alan Cox wrote: > Its usually faster if you don't misalign on x86 as well. i'm not sure if i agree with "usually"... but i know you (alan) are probably aware of the exact requirements of the hw. for everyone else: on intel x86 processors an access is unaligned only if it crosses a cacheline boundary (64 bytes). otherwise it's aligned. the penalty for crossing a cacheline boundary varies from ~12 cycles (core2) to many dozens of cycles (p4). on AMD x86 pre-family 10h the boundary is 8 bytes, and on fam 10h it's 16 bytes. the penalty is a mere 3 cycles if an access crosses the specified boundary. if you're making <= 4 byte accesses i recommend not worrying about alignment on x86. it's pretty hard to beat the hardware support. i curse all the RISC and embedded processor designers who pretend unaligned accesses are something evil and to be avoided. in case you're worried, MIPS patent 4,814,976 expired in december 2006 :) -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Nov 22, 2007, at 20:29:11, Alan Cox wrote: Most architectures are unable to perform unaligned memory accesses. Any unaligned access causes a processor exception. Not all. Some simply produce the wrong answer - thats oh so much more exciting. As one example, the MicroBlaze soft-core processor family designed for use on Xilinx FPGAs will (by default) simply forcibly zero the lower bits of the unaligned address, such that the following code will fail mysteriously: const char foo[] = { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07 }; printf("0x%08lx 0x%08lx 0x%08lx 0x%08lx\n", *((u32 *)(foo+0)), *((u32 *)(foo+1)), *((u32 *)(foo+2)), *((u32 *)(foo+3))); Instead of outputting: 0x00010203 0x01020304 0x02030405 0x03040506 It will output: 0x00010203 0x00010203 0x00010203 0x00010203 Other embedded architectures have very similar problems. Some may provide an "unaligned data access" exception, but offer insufficient information to repair the damage and resume execution. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
Robert Hancock <[EMAIL PROTECTED]> writes: > > Also, x86 doesn't prohibit unaligned accesses, That depends, e.g. for SSE2 they can be forbidden. > but I believe they have > a significant performance cost and are best avoided where possible. On Opteron the typical cost of a misaligned access is a single cycle and some possible penalty to load-store forwarding. On Intel it is a bit worse, but not all that much. Unless you do a lot of accesses of it in a loop it's not really worth something caring about too much. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
> Most architectures are unable to perform unaligned memory accesses. Any > unaligned access causes a processor exception. Not all. Some simply produce the wrong answer - thats oh so much more exciting. > You may be wondering why you have never seen these problems on your own > architecture. Some architectures (such as i386 and x86_64) do not have this > limitation, but nevertheless it is important for you to write portable code > that works everywhere. Its usually faster if you don't misalign on x86 as well. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
Thanks you for working proactively on these problems. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
Daniel Drake wrote: Being spoilt by the luxuries of i386/x86_64 I've never really had a good grasp on unaligned memory access problems on other architectures and decided it was time to figure it out. As a result I've written this documentation which I plan to submit for inclusion as Documentation/unaligned_memory_access.txt Before I do so, any comments on the following? ... You may be wondering why you have never seen these problems on your own architecture. Some architectures (such as i386 and x86_64) do not have this limitation, but nevertheless it is important for you to write portable code that works everywhere. Also, x86 doesn't prohibit unaligned accesses, but I believe they have a significant performance cost and are best avoided where possible. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Nov 22, 2007 4:15 PM, Daniel Drake <[EMAIL PROTECTED]> wrote: > Before I do so, any comments on the following? > < above case it would insert 2 bytes of padding inbetween field1 and field2. > above case it would insert 2 bytes of padding in between field1 and field2. < moving field3 to sit inbetween field1 and field2 (where the padding is > moving field3 to sit in between field1 and field2 (where the padding is -- avuton -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] Documentation about unaligned memory access
Being spoilt by the luxuries of i386/x86_64 I've never really had a good grasp on unaligned memory access problems on other architectures and decided it was time to figure it out. As a result I've written this documentation which I plan to submit for inclusion as Documentation/unaligned_memory_access.txt Before I do so, any comments on the following? Thanks, Daniel UNALIGNED MEMORY ACCESSES = Linux runs on a wide variety of architectures which have varying behaviour when it comes to memory access. This document presents some details about unaligned accesses, why you need to write code that doesn't cause them, and how to write such code! What's the definition of an unaligned access? = Unaligned memory accesses occur when you try to read N bytes of data starting from an address that is not evenly divisible by N (i.e. addr % N != 0). For example, reading 4 bytes of data from address 0x1004 is fine, but reading 4 bytes of data from address 0x1005 would be an unaligned memory access. Why unaligned access is bad === Most architectures are unable to perform unaligned memory accesses. Any unaligned access causes a processor exception. Some architectures have an exception handler implemented in the kernel which corrects the memory access, but this is very expensive and is not true for all architectures. You cannot rely on the exception handler to correct your memory accesses. In summary: if your code causes unaligned memory accesses to happen, your code will not work on some platforms, and will perform *very* badly on others. You may be wondering why you have never seen these problems on your own architecture. Some architectures (such as i386 and x86_64) do not have this limitation, but nevertheless it is important for you to write portable code that works everywhere. Natural alignment = The rule we mentioned earlier forms what we refer to as natural alignment: When accessing N bytes of memory, the base memory address must be evenly divisible by N, i.e. addr % N == 0 When writing code, assume the target architecture has natural alignment requirements. Sidenote: in reality, only a few architectures require natural alignment on all sizes of memory access. However, again we must consider ALL supported architectures; natural alignment is the only way to achieve full portability. Code that doesn't cause unaligned access At first, the concepts above may seem a little hard to relate to actual coding practice. After all, you don't have a great deal of control over memory addresses of certain variables, etc. Fortunately things are not too complex, as in most cases, the compiler ensures that things will work for you. For example, take the following structure: struct foo { u16 field1; u32 field2; u8 field3; }; Let us assume that an instance of the above structure resides in memory starting at address 0x1000. With a basic level of understanding, it would not be unreasonable to expect that accessing field2 would cause an unaligned access. You'd be expecting field2 to be located at offset 2 bytes into the structure, i.e. address 0x1002, but that address is not evenly divisible by 4 (remember, we're reading a 4 byte value here). Fortunately, the compiler understands the alignment constraints, so in the above case it would insert 2 bytes of padding inbetween field1 and field2. Therefore, for standard structure types you can always rely on the compiler to pad structures so that accesses to fields are suitably aligned (assuming you do not cast the field to a type of different length). Similarly, you can also rely on the compiler to align variables and function parameters to a naturally aligned scheme, based on the size of the type of the variable. Sidenote: in the above example, you may wish to reorder the fields in the above structure so that the overall structure uses less memory. For example, moving field3 to sit inbetween field1 and field2 (where the padding is inserted) would shrink the overall structure by 1 byte: struct foo { u16 field1; u8 field3; u32 field2; }; Sidenote: it should be obvious by now, but in case it is not, accessing a single byte (u8 or char) can never cause an unaligned access, because all memory addresses are evenly divisible by 1. Code that causes unaligned access = With the above in mind, let's move onto a real life example of a function that can cause an unaligned memory access. The following function adapted from include/linux/etherdevice.h is an optimized routine to compare two ethernet MAC addresses for equality. unsigned int compare_ether_addr(const u8 *addr1, const u8 *addr2) { const u16 *a = (const u16 *) addr1; const u16 *b =
[RFC] Documentation about unaligned memory access
Being spoilt by the luxuries of i386/x86_64 I've never really had a good grasp on unaligned memory access problems on other architectures and decided it was time to figure it out. As a result I've written this documentation which I plan to submit for inclusion as Documentation/unaligned_memory_access.txt Before I do so, any comments on the following? Thanks, Daniel UNALIGNED MEMORY ACCESSES = Linux runs on a wide variety of architectures which have varying behaviour when it comes to memory access. This document presents some details about unaligned accesses, why you need to write code that doesn't cause them, and how to write such code! What's the definition of an unaligned access? = Unaligned memory accesses occur when you try to read N bytes of data starting from an address that is not evenly divisible by N (i.e. addr % N != 0). For example, reading 4 bytes of data from address 0x1004 is fine, but reading 4 bytes of data from address 0x1005 would be an unaligned memory access. Why unaligned access is bad === Most architectures are unable to perform unaligned memory accesses. Any unaligned access causes a processor exception. Some architectures have an exception handler implemented in the kernel which corrects the memory access, but this is very expensive and is not true for all architectures. You cannot rely on the exception handler to correct your memory accesses. In summary: if your code causes unaligned memory accesses to happen, your code will not work on some platforms, and will perform *very* badly on others. You may be wondering why you have never seen these problems on your own architecture. Some architectures (such as i386 and x86_64) do not have this limitation, but nevertheless it is important for you to write portable code that works everywhere. Natural alignment = The rule we mentioned earlier forms what we refer to as natural alignment: When accessing N bytes of memory, the base memory address must be evenly divisible by N, i.e. addr % N == 0 When writing code, assume the target architecture has natural alignment requirements. Sidenote: in reality, only a few architectures require natural alignment on all sizes of memory access. However, again we must consider ALL supported architectures; natural alignment is the only way to achieve full portability. Code that doesn't cause unaligned access At first, the concepts above may seem a little hard to relate to actual coding practice. After all, you don't have a great deal of control over memory addresses of certain variables, etc. Fortunately things are not too complex, as in most cases, the compiler ensures that things will work for you. For example, take the following structure: struct foo { u16 field1; u32 field2; u8 field3; }; Let us assume that an instance of the above structure resides in memory starting at address 0x1000. With a basic level of understanding, it would not be unreasonable to expect that accessing field2 would cause an unaligned access. You'd be expecting field2 to be located at offset 2 bytes into the structure, i.e. address 0x1002, but that address is not evenly divisible by 4 (remember, we're reading a 4 byte value here). Fortunately, the compiler understands the alignment constraints, so in the above case it would insert 2 bytes of padding inbetween field1 and field2. Therefore, for standard structure types you can always rely on the compiler to pad structures so that accesses to fields are suitably aligned (assuming you do not cast the field to a type of different length). Similarly, you can also rely on the compiler to align variables and function parameters to a naturally aligned scheme, based on the size of the type of the variable. Sidenote: in the above example, you may wish to reorder the fields in the above structure so that the overall structure uses less memory. For example, moving field3 to sit inbetween field1 and field2 (where the padding is inserted) would shrink the overall structure by 1 byte: struct foo { u16 field1; u8 field3; u32 field2; }; Sidenote: it should be obvious by now, but in case it is not, accessing a single byte (u8 or char) can never cause an unaligned access, because all memory addresses are evenly divisible by 1. Code that causes unaligned access = With the above in mind, let's move onto a real life example of a function that can cause an unaligned memory access. The following function adapted from include/linux/etherdevice.h is an optimized routine to compare two ethernet MAC addresses for equality. unsigned int compare_ether_addr(const u8 *addr1, const u8 *addr2) { const u16 *a = (const u16 *) addr1; const u16 *b =
Re: [RFC] Documentation about unaligned memory access
Most architectures are unable to perform unaligned memory accesses. Any unaligned access causes a processor exception. Not all. Some simply produce the wrong answer - thats oh so much more exciting. You may be wondering why you have never seen these problems on your own architecture. Some architectures (such as i386 and x86_64) do not have this limitation, but nevertheless it is important for you to write portable code that works everywhere. Its usually faster if you don't misalign on x86 as well. Alan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Fri, 23 Nov 2007, Alan Cox wrote: Its usually faster if you don't misalign on x86 as well. i'm not sure if i agree with usually... but i know you (alan) are probably aware of the exact requirements of the hw. for everyone else: on intel x86 processors an access is unaligned only if it crosses a cacheline boundary (64 bytes). otherwise it's aligned. the penalty for crossing a cacheline boundary varies from ~12 cycles (core2) to many dozens of cycles (p4). on AMD x86 pre-family 10h the boundary is 8 bytes, and on fam 10h it's 16 bytes. the penalty is a mere 3 cycles if an access crosses the specified boundary. if you're making = 4 byte accesses i recommend not worrying about alignment on x86. it's pretty hard to beat the hardware support. i curse all the RISC and embedded processor designers who pretend unaligned accesses are something evil and to be avoided. in case you're worried, MIPS patent 4,814,976 expired in december 2006 :) -dean - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Nov 22, 2007 4:15 PM, Daniel Drake [EMAIL PROTECTED] wrote: Before I do so, any comments on the following? above case it would insert 2 bytes of padding inbetween field1 and field2. above case it would insert 2 bytes of padding in between field1 and field2. moving field3 to sit inbetween field1 and field2 (where the padding is moving field3 to sit in between field1 and field2 (where the padding is -- avuton -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Nov 22, 2007, at 20:29:11, Alan Cox wrote: Most architectures are unable to perform unaligned memory accesses. Any unaligned access causes a processor exception. Not all. Some simply produce the wrong answer - thats oh so much more exciting. As one example, the MicroBlaze soft-core processor family designed for use on Xilinx FPGAs will (by default) simply forcibly zero the lower bits of the unaligned address, such that the following code will fail mysteriously: const char foo[] = { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07 }; printf(0x%08lx 0x%08lx 0x%08lx 0x%08lx\n, *((u32 *)(foo+0)), *((u32 *)(foo+1)), *((u32 *)(foo+2)), *((u32 *)(foo+3))); Instead of outputting: 0x00010203 0x01020304 0x02030405 0x03040506 It will output: 0x00010203 0x00010203 0x00010203 0x00010203 Other embedded architectures have very similar problems. Some may provide an unaligned data access exception, but offer insufficient information to repair the damage and resume execution. Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
Robert Hancock [EMAIL PROTECTED] writes: Also, x86 doesn't prohibit unaligned accesses, That depends, e.g. for SSE2 they can be forbidden. but I believe they have a significant performance cost and are best avoided where possible. On Opteron the typical cost of a misaligned access is a single cycle and some possible penalty to load-store forwarding. On Intel it is a bit worse, but not all that much. Unless you do a lot of accesses of it in a loop it's not really worth something caring about too much. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
Daniel Drake wrote: Being spoilt by the luxuries of i386/x86_64 I've never really had a good grasp on unaligned memory access problems on other architectures and decided it was time to figure it out. As a result I've written this documentation which I plan to submit for inclusion as Documentation/unaligned_memory_access.txt Before I do so, any comments on the following? ... You may be wondering why you have never seen these problems on your own architecture. Some architectures (such as i386 and x86_64) do not have this limitation, but nevertheless it is important for you to write portable code that works everywhere. Also, x86 doesn't prohibit unaligned accesses, but I believe they have a significant performance cost and are best avoided where possible. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
Thanks you for working proactively on these problems. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/