Re: [RFC] Documentation about unaligned memory access

2007-11-30 Thread Jörn Engel
On Fri, 23 November 2007 00:15:53 +, Daniel Drake wrote:
> 
> What's the definition of an unaligned access?
> =
> 
> Unaligned memory accesses occur when you try to read N bytes of data starting
> from an address that is not evenly divisible by N (i.e. addr % N != 0).
> For example, reading 4 bytes of data from address 0x1004 is fine, but
> reading 4 bytes of data from address 0x1005 would be an unaligned memory
> access.

The wording could also apply to a DMA of 8k from a 4k-aligned address.
But I don't have a good idea how to improve it.

> It's safe to assume that memcpy will always copy bytewise and hence will
> never cause an unaligned access.

s/always copy/always behave as if copying/

memcpy usually copies at least wordwise, possibly even in bigger chunks.
But that is just the inner loop.  Unaligned bytes at the beginning/end
receive special treatment.

Jörn

-- 
The rabbit runs faster than the fox, because the rabbit is rinning for
his life while the fox is only running for his dinner.
-- Aesop
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-30 Thread Jörn Engel
On Fri, 23 November 2007 00:15:53 +, Daniel Drake wrote:
 
 What's the definition of an unaligned access?
 =
 
 Unaligned memory accesses occur when you try to read N bytes of data starting
 from an address that is not evenly divisible by N (i.e. addr % N != 0).
 For example, reading 4 bytes of data from address 0x1004 is fine, but
 reading 4 bytes of data from address 0x1005 would be an unaligned memory
 access.

The wording could also apply to a DMA of 8k from a 4k-aligned address.
But I don't have a good idea how to improve it.

 It's safe to assume that memcpy will always copy bytewise and hence will
 never cause an unaligned access.

s/always copy/always behave as if copying/

memcpy usually copies at least wordwise, possibly even in bigger chunks.
But that is just the inner loop.  Unaligned bytes at the beginning/end
receive special treatment.

Jörn

-- 
The rabbit runs faster than the fox, because the rabbit is rinning for
his life while the fox is only running for his dinner.
-- Aesop
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-26 Thread Kumar Gala


On Nov 23, 2007, at 5:43 AM, Heikki Orsila wrote:


On Fri, Nov 23, 2007 at 12:15:53AM +, Daniel Drake wrote:

Why unaligned access is bad
===

Most architectures are unable to perform unaligned memory accesses.  
Any

unaligned access causes a processor exception.


"Some architectures are unable to perform unaligned memory accesses,
either an exception is generated, or the data
access is silently invalid. In architectures that allow unaligned
access, natural aligned accesses are usually faster than non-aligned."

In summary: if your code causes unaligned memory accesses to  
happen, your code
will not work on some platforms, and will perform *very* badly on  
others.


*very* -> *slower*


Natural alignment
=


Please move this definition before "Why unaligned access is bad".

Also, it would be nice to have a table of ISAs:

ISA NeedNeed
natural alignment
alignment   by x

m68kNo  2
powerpc/ppc Yes Word size


on ppc it varies from processor to processor if misaligned data is  
fixed up or causes an exception.  However its highly recommend to be  
naturally aligned.  I'm not sure I follow what is meant by the second  
column (need alignment by x).


- k
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-26 Thread Arnaldo Carvalho de Melo
Em Mon, Nov 26, 2007 at 03:47:06PM +0100, Johannes Berg escreveu:
> 
> > Sidenote: in the above example, you may wish to reorder the fields in the
> > above structure so that the overall structure uses less memory. For example,
> > moving field3 to sit inbetween field1 and field2 (where the padding is
> > inserted) would shrink the overall structure by 1 byte:
> > 
> > struct foo {
> > u16 field1;
> > u8 field3;
> > u32 field2;
> > };
> 
> You can reorder to u32, u16, u8 order and save another byte :)
> 
> A reference to pahole could be appropriate here, and probably a small
> note that some large existing structures like netdev have deliberate
> holes to achieve cache alignment.

shameless plug:

https://ols2006.108.redhat.com/2007/Reprints/melo-Reprint.pdf

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-26 Thread Ben Dooks
On Fri, Nov 23, 2007 at 01:43:29PM +0200, Heikki Orsila wrote:
> On Fri, Nov 23, 2007 at 12:15:53AM +, Daniel Drake wrote:
> > Why unaligned access is bad
> > ===
> > 
> > Most architectures are unable to perform unaligned memory accesses. Any
> > unaligned access causes a processor exception.
> 
> "Some architectures are unable to perform unaligned memory accesses, 
> either an exception is generated, or the data 
> access is silently invalid. In architectures that allow unaligned 
> access, natural aligned accesses are usually faster than non-aligned."
> 
> > In summary: if your code causes unaligned memory accesses to happen, your 
> > code
> > will not work on some platforms, and will perform *very* badly on others.
> 
> *very* -> *slower*
> 
> > Natural alignment
> > =
> 
> Please move this definition before "Why unaligned access is bad".
> 
> Also, it would be nice to have a table of ISAs:
> 
> ISA   NeedNeed
>   natural alignment
>   alignment   by x
> 
> m68k  No  2
> powerpc/ppc   Yes Word size
> x86   No  No
> x86_64No  No
arm32   Yes 2 for 16bit data, 4 for 32bit

Note, if the unaligned handler is running, the alignment will be fixed
by the fault handler (at the cost of taking a fault). If the unaligned
handler is turned off, you get a "free" shift of the data instead.

-- 
Ben ([EMAIL PROTECTED], http://www.fluff.org/)

  'a smiley only costs 4 bytes'
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-26 Thread Johannes Berg

> Going back to an earlier example:
>   void myfunc(u8 *data, u32 value)
>   {
>   [...]
>   *((u16 *) data) = cpu_to_le32(value);
>   [...]

typo? should it be a u32 cast?

> To avoid the unaligned memory access, you could rewrite it as follows:
> 
>   void myfunc(u8 *data, u32 value)
>   {
>   [...]
>   value = cpu_to_le32(value);
>   memcpy(data, value, sizeof(value));
>   [...]
>   }

I think you should use put_unaligned here as well. Or maybe just reorder
this vs. the section below where you use get/put_unaligned.

johannes


signature.asc
Description: This is a digitally signed message part


Re: [RFC] Documentation about unaligned memory access

2007-11-26 Thread Johannes Berg

> Sidenote: in the above example, you may wish to reorder the fields in the
> above structure so that the overall structure uses less memory. For example,
> moving field3 to sit inbetween field1 and field2 (where the padding is
> inserted) would shrink the overall structure by 1 byte:
> 
>   struct foo {
>   u16 field1;
>   u8 field3;
>   u32 field2;
>   };

You can reorder to u32, u16, u8 order and save another byte :)

A reference to pahole could be appropriate here, and probably a small
note that some large existing structures like netdev have deliberate
holes to achieve cache alignment.

johannes


signature.asc
Description: This is a digitally signed message part


Re: [RFC] Documentation about unaligned memory access

2007-11-26 Thread dean gaudet
On Fri, 23 Nov 2007, Arne Georg Gleditsch wrote:

> dean gaudet <[EMAIL PROTECTED]> writes:
> > on AMD x86 pre-family 10h the boundary is 8 bytes, and on fam 10h it's 16 
> > bytes.  the penalty is a mere 3 cycles if an access crosses the specified 
> > boundary.
> 
> Worth noting though, is that atomic accesses that cross cache lines on
> an Opteron system is going to lock down the Hypertransport fabric for
> you during the operation -- which is obviously not so nice.

ooh awesome, i hadn't measured that before.

on a 2 node sockF / revF with a random pointer chase running on cpu 0 / 
node 0 i see the avg load-to-load cache miss latency jump from 77ns to 
109ns when i add an unaligned lock-intensive workload on one core of node 
1.  the worst i can get the pointer chase latency to is 273ns when i add 
two threads on node 1 fighting over an unaligned lock.

on a 4 node (square) the worst case i can get seems to be an increase from 
98ns with no antagonist to 385ns with 6 antagonists fighting over an 
unaligned lock on the other 3 nodes.

cool.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-26 Thread DM
On Nov 23, 2007 1:15 AM, Daniel Drake <[EMAIL PROTECTED]> wrote:
[...]
>
> Before I do so, any comments on the following?
>
[...]
> void myfunc(u8 *data, u32 value)
> {
> [...]
> value = cpu_to_le32(value);
> memcpy(data, value, sizeof(value));
> [...]
> }

I suppose you mean:
memcpy(data, , sizeof(value));

/DM
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-26 Thread DM
On Nov 23, 2007 1:15 AM, Daniel Drake [EMAIL PROTECTED] wrote:
[...]

 Before I do so, any comments on the following?

[...]
 void myfunc(u8 *data, u32 value)
 {
 [...]
 value = cpu_to_le32(value);
 memcpy(data, value, sizeof(value));
 [...]
 }

I suppose you mean:
memcpy(data, value, sizeof(value));

/DM
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-26 Thread dean gaudet
On Fri, 23 Nov 2007, Arne Georg Gleditsch wrote:

 dean gaudet [EMAIL PROTECTED] writes:
  on AMD x86 pre-family 10h the boundary is 8 bytes, and on fam 10h it's 16 
  bytes.  the penalty is a mere 3 cycles if an access crosses the specified 
  boundary.
 
 Worth noting though, is that atomic accesses that cross cache lines on
 an Opteron system is going to lock down the Hypertransport fabric for
 you during the operation -- which is obviously not so nice.

ooh awesome, i hadn't measured that before.

on a 2 node sockF / revF with a random pointer chase running on cpu 0 / 
node 0 i see the avg load-to-load cache miss latency jump from 77ns to 
109ns when i add an unaligned lock-intensive workload on one core of node 
1.  the worst i can get the pointer chase latency to is 273ns when i add 
two threads on node 1 fighting over an unaligned lock.

on a 4 node (square) the worst case i can get seems to be an increase from 
98ns with no antagonist to 385ns with 6 antagonists fighting over an 
unaligned lock on the other 3 nodes.

cool.

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-26 Thread Johannes Berg

 Sidenote: in the above example, you may wish to reorder the fields in the
 above structure so that the overall structure uses less memory. For example,
 moving field3 to sit inbetween field1 and field2 (where the padding is
 inserted) would shrink the overall structure by 1 byte:
 
   struct foo {
   u16 field1;
   u8 field3;
   u32 field2;
   };

You can reorder to u32, u16, u8 order and save another byte :)

A reference to pahole could be appropriate here, and probably a small
note that some large existing structures like netdev have deliberate
holes to achieve cache alignment.

johannes


signature.asc
Description: This is a digitally signed message part


Re: [RFC] Documentation about unaligned memory access

2007-11-26 Thread Johannes Berg

 Going back to an earlier example:
   void myfunc(u8 *data, u32 value)
   {
   [...]
   *((u16 *) data) = cpu_to_le32(value);
   [...]

typo? should it be a u32 cast?

 To avoid the unaligned memory access, you could rewrite it as follows:
 
   void myfunc(u8 *data, u32 value)
   {
   [...]
   value = cpu_to_le32(value);
   memcpy(data, value, sizeof(value));
   [...]
   }

I think you should use put_unaligned here as well. Or maybe just reorder
this vs. the section below where you use get/put_unaligned.

johannes


signature.asc
Description: This is a digitally signed message part


Re: [RFC] Documentation about unaligned memory access

2007-11-26 Thread Ben Dooks
On Fri, Nov 23, 2007 at 01:43:29PM +0200, Heikki Orsila wrote:
 On Fri, Nov 23, 2007 at 12:15:53AM +, Daniel Drake wrote:
  Why unaligned access is bad
  ===
  
  Most architectures are unable to perform unaligned memory accesses. Any
  unaligned access causes a processor exception.
 
 Some architectures are unable to perform unaligned memory accesses, 
 either an exception is generated, or the data 
 access is silently invalid. In architectures that allow unaligned 
 access, natural aligned accesses are usually faster than non-aligned.
 
  In summary: if your code causes unaligned memory accesses to happen, your 
  code
  will not work on some platforms, and will perform *very* badly on others.
 
 *very* - *slower*
 
  Natural alignment
  =
 
 Please move this definition before Why unaligned access is bad.
 
 Also, it would be nice to have a table of ISAs:
 
 ISA   NeedNeed
   natural alignment
   alignment   by x
 
 m68k  No  2
 powerpc/ppc   Yes Word size
 x86   No  No
 x86_64No  No
arm32   Yes 2 for 16bit data, 4 for 32bit

Note, if the unaligned handler is running, the alignment will be fixed
by the fault handler (at the cost of taking a fault). If the unaligned
handler is turned off, you get a free shift of the data instead.

-- 
Ben ([EMAIL PROTECTED], http://www.fluff.org/)

  'a smiley only costs 4 bytes'
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-26 Thread Arnaldo Carvalho de Melo
Em Mon, Nov 26, 2007 at 03:47:06PM +0100, Johannes Berg escreveu:
 
  Sidenote: in the above example, you may wish to reorder the fields in the
  above structure so that the overall structure uses less memory. For example,
  moving field3 to sit inbetween field1 and field2 (where the padding is
  inserted) would shrink the overall structure by 1 byte:
  
  struct foo {
  u16 field1;
  u8 field3;
  u32 field2;
  };
 
 You can reorder to u32, u16, u8 order and save another byte :)
 
 A reference to pahole could be appropriate here, and probably a small
 note that some large existing structures like netdev have deliberate
 holes to achieve cache alignment.

shameless plug:

https://ols2006.108.redhat.com/2007/Reprints/melo-Reprint.pdf

- Arnaldo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-26 Thread Kumar Gala


On Nov 23, 2007, at 5:43 AM, Heikki Orsila wrote:


On Fri, Nov 23, 2007 at 12:15:53AM +, Daniel Drake wrote:

Why unaligned access is bad
===

Most architectures are unable to perform unaligned memory accesses.  
Any

unaligned access causes a processor exception.


Some architectures are unable to perform unaligned memory accesses,
either an exception is generated, or the data
access is silently invalid. In architectures that allow unaligned
access, natural aligned accesses are usually faster than non-aligned.

In summary: if your code causes unaligned memory accesses to  
happen, your code
will not work on some platforms, and will perform *very* badly on  
others.


*very* - *slower*


Natural alignment
=


Please move this definition before Why unaligned access is bad.

Also, it would be nice to have a table of ISAs:

ISA NeedNeed
natural alignment
alignment   by x

m68kNo  2
powerpc/ppc Yes Word size


on ppc it varies from processor to processor if misaligned data is  
fixed up or causes an exception.  However its highly recommend to be  
naturally aligned.  I'm not sure I follow what is meant by the second  
column (need alignment by x).


- k
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-25 Thread Alan Cox

> mc68020+  No  No
> (mc68000/010  No  2)  (not for Linux)

Actually ucLinux has been persuaded to run on m68000.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-25 Thread Olaf Titz
> Unaligned memory accesses occur when you try to read N bytes of data starting
> from an address that is not evenly divisible by N (i.e. addr % N != 0).

Should clarify that you mean "with power-of-two N" - even more
strictly this depends on the processor, but I'm pretty sure there is
none which supports aligned accesses of N==3...

Olaf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-25 Thread Heikki Orsila
On Sun, Nov 25, 2007 at 12:16:08PM +0100, Geert Uytterhoeven wrote:
> > ISA NeedNeed
> > natural alignment
> > alignment   by x
> > 
> > m68kNo  2
> 
> `No' for >= 68020.
> `Yes' for < 68020.

My bad, yes..

mc68020+No  No
(mc68000/010No  2)  (not for Linux)

-- 
Heikki Orsila   Barbie's law:
[EMAIL PROTECTED]   "Math is hard, let's go shopping!"
http://www.iki.fi/shd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-25 Thread Geert Uytterhoeven
On Fri, 23 Nov 2007, Heikki Orsila wrote:
> On Fri, Nov 23, 2007 at 12:15:53AM +, Daniel Drake wrote:
> > Why unaligned access is bad
> > ===
> > 
> > Most architectures are unable to perform unaligned memory accesses. Any
> > unaligned access causes a processor exception.
> 
> "Some architectures are unable to perform unaligned memory accesses, 
> either an exception is generated, or the data 
> access is silently invalid. In architectures that allow unaligned 
> access, natural aligned accesses are usually faster than non-aligned."
> 
> > In summary: if your code causes unaligned memory accesses to happen, your 
> > code
> > will not work on some platforms, and will perform *very* badly on others.
> 
> *very* -> *slower*
> 
> > Natural alignment
> > =
> 
> Please move this definition before "Why unaligned access is bad".
> 
> Also, it would be nice to have a table of ISAs:
> 
> ISA   NeedNeed
>   natural alignment
>   alignment   by x
> 
> m68k  No  2

`No' for >= 68020.
`Yes' for < 68020.

> powerpc/ppc   Yes Word size
> x86   No  No
> x86_64No  No

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-25 Thread Denys Vlasenko
On Thursday 22 November 2007 16:15, Daniel Drake wrote:
> In summary: if your code causes unaligned memory accesses to happen, your
> code will not work on some platforms, and will perform *very* badly on
> others.

Although understanding alignment is important, there is another
extreme - what I call "sadistic alignment". It's when data is being
aligned even if it will definitely run on an arch which doesn't require
this (arch/x86/*), or data being aligned to ridiculously large boundary.

Like gcc aligning any char array bigger that 31 byte to 32 bytes.
Bytes, not bits. Try to compile this with -O2:

static char s1[] = "12345678901234567890123456789012";
static char s2[] = "12345678901234567890123456789012";
void f(char*);
void g() {
f(s1);
f(s2);
}

$ hexdump -Cv t.o
  7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00  |.ELF|
0010  01 00 03 00 01 00 00 00  00 00 00 00 00 00 00 00  ||
0020  38 01 00 00 00 00 00 00  34 00 00 00 00 00 28 00  |8...4.(.|
0030  0a 00 07 00 55 89 e5 83  ec 08 c7 04 24 40 00 00  |[EMAIL PROTECTED]|
0040  00 e8 fc ff ff ff c7 04  24 00 00 00 00 e8 fc ff  |$...|
0050  ff ff c9 c3 00 00 00 00  00 00 00 00 00 00 00 00  ||  
<=== HERE
0060  31 32 33 34 35 36 37 38  39 30 31 32 33 34 35 36  |1234567890123456|
0070  37 38 39 30 31 32 33 34  35 36 37 38 39 30 31 32  |7890123456789012|
0080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||  
<=== HERE
0090  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||  
<=== HERE
00a0  31 32 33 34 35 36 37 38  39 30 31 32 33 34 35 36  |1234567890123456|
00b0  37 38 39 30 31 32 33 34  35 36 37 38 39 30 31 32  |7890123456789012|
00c0  00 00 00 00 00 47 43 43  3a 20 28 47 4e 55 29 20  |.GCC: (GNU) |
00d0  34 2e 30 2e 33 20 28 55  62 75 6e 74 75 20 34 2e  |4.0.3 (Ubuntu 4.|
00e0  30 2e 33 2d 31 75 62 75  6e 74 75 35 29 00 00 2e  |0.3-1ubuntu5)...|
00f0  73 79 6d 74 61 62 00 2e  73 74 72 74 61 62 00 2e  |symtab..strtab..|

43 bytes wasted!

Thankfully, it is fixed in later gcc versions.

Please do not succumb to "alignment scare" in your doc.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-25 Thread Denys Vlasenko
On Thursday 22 November 2007 16:15, Daniel Drake wrote:
 In summary: if your code causes unaligned memory accesses to happen, your
 code will not work on some platforms, and will perform *very* badly on
 others.

Although understanding alignment is important, there is another
extreme - what I call sadistic alignment. It's when data is being
aligned even if it will definitely run on an arch which doesn't require
this (arch/x86/*), or data being aligned to ridiculously large boundary.

Like gcc aligning any char array bigger that 31 byte to 32 bytes.
Bytes, not bits. Try to compile this with -O2:

static char s1[] = 12345678901234567890123456789012;
static char s2[] = 12345678901234567890123456789012;
void f(char*);
void g() {
f(s1);
f(s2);
}

$ hexdump -Cv t.o
  7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00  |.ELF|
0010  01 00 03 00 01 00 00 00  00 00 00 00 00 00 00 00  ||
0020  38 01 00 00 00 00 00 00  34 00 00 00 00 00 28 00  |8...4.(.|
0030  0a 00 07 00 55 89 e5 83  ec 08 c7 04 24 40 00 00  |[EMAIL PROTECTED]|
0040  00 e8 fc ff ff ff c7 04  24 00 00 00 00 e8 fc ff  |$...|
0050  ff ff c9 c3 00 00 00 00  00 00 00 00 00 00 00 00  ||  
=== HERE
0060  31 32 33 34 35 36 37 38  39 30 31 32 33 34 35 36  |1234567890123456|
0070  37 38 39 30 31 32 33 34  35 36 37 38 39 30 31 32  |7890123456789012|
0080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||  
=== HERE
0090  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||  
=== HERE
00a0  31 32 33 34 35 36 37 38  39 30 31 32 33 34 35 36  |1234567890123456|
00b0  37 38 39 30 31 32 33 34  35 36 37 38 39 30 31 32  |7890123456789012|
00c0  00 00 00 00 00 47 43 43  3a 20 28 47 4e 55 29 20  |.GCC: (GNU) |
00d0  34 2e 30 2e 33 20 28 55  62 75 6e 74 75 20 34 2e  |4.0.3 (Ubuntu 4.|
00e0  30 2e 33 2d 31 75 62 75  6e 74 75 35 29 00 00 2e  |0.3-1ubuntu5)...|
00f0  73 79 6d 74 61 62 00 2e  73 74 72 74 61 62 00 2e  |symtab..strtab..|

43 bytes wasted!

Thankfully, it is fixed in later gcc versions.

Please do not succumb to alignment scare in your doc.
--
vda
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-25 Thread Geert Uytterhoeven
On Fri, 23 Nov 2007, Heikki Orsila wrote:
 On Fri, Nov 23, 2007 at 12:15:53AM +, Daniel Drake wrote:
  Why unaligned access is bad
  ===
  
  Most architectures are unable to perform unaligned memory accesses. Any
  unaligned access causes a processor exception.
 
 Some architectures are unable to perform unaligned memory accesses, 
 either an exception is generated, or the data 
 access is silently invalid. In architectures that allow unaligned 
 access, natural aligned accesses are usually faster than non-aligned.
 
  In summary: if your code causes unaligned memory accesses to happen, your 
  code
  will not work on some platforms, and will perform *very* badly on others.
 
 *very* - *slower*
 
  Natural alignment
  =
 
 Please move this definition before Why unaligned access is bad.
 
 Also, it would be nice to have a table of ISAs:
 
 ISA   NeedNeed
   natural alignment
   alignment   by x
 
 m68k  No  2

`No' for = 68020.
`Yes' for  68020.

 powerpc/ppc   Yes Word size
 x86   No  No
 x86_64No  No

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say programmer or something like that.
-- Linus Torvalds
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-25 Thread Heikki Orsila
On Sun, Nov 25, 2007 at 12:16:08PM +0100, Geert Uytterhoeven wrote:
  ISA NeedNeed
  natural alignment
  alignment   by x
  
  m68kNo  2
 
 `No' for = 68020.
 `Yes' for  68020.

My bad, yes..

mc68020+No  No
(mc68000/010No  2)  (not for Linux)

-- 
Heikki Orsila   Barbie's law:
[EMAIL PROTECTED]   Math is hard, let's go shopping!
http://www.iki.fi/shd
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-25 Thread Olaf Titz
 Unaligned memory accesses occur when you try to read N bytes of data starting
 from an address that is not evenly divisible by N (i.e. addr % N != 0).

Should clarify that you mean with power-of-two N - even more
strictly this depends on the processor, but I'm pretty sure there is
none which supports aligned accesses of N==3...

Olaf
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-25 Thread Alan Cox

 mc68020+  No  No
 (mc68000/010  No  2)  (not for Linux)

Actually ucLinux has been persuaded to run on m68000.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-24 Thread Luciano Rocha
On Sat, Nov 24, 2007 at 06:35:25PM +0100, Pierre Ossman wrote:
> On Sat, 24 Nov 2007 17:22:36 +
> Luciano Rocha <[EMAIL PROTECTED]> wrote:
> 
> > On Sat, Nov 24, 2007 at 05:19:31PM +0100, Pierre Ossman wrote:
> > > It most certainly does not. gcc will assume that an int* has int 
> > > alignment. memcpy() is a builtin, which gcc can translate to pretty much 
> > > anything. And C specifies that a pointer to foo, will point to a real 
> > > object of type foo, so gcc can't be blamed for the unsafe typecasts. I 
> > > have tested this the hard way, so this is not just speculation.
> > 
> > Yes, on *int and other assumed aligned pointers, gcc uses its internal
> > version.
> > 
> > However, my point is that those pointers, unless speaking of packed
> > structures, can safely be assumed aligned, while char*/void* can't.
> > 
> 
> I get the sensation we're violently in agreement here, just misunderstanding 
> each other. :)

That's it. :)

Sorry for the noise,...

-- 
lfr
0/0


pgprb39HuMXhL.pgp
Description: PGP signature


Re: [RFC] Documentation about unaligned memory access

2007-11-24 Thread Haavard Skinnemoen
On Sat, 24 Nov 2007 17:22:36 +
Luciano Rocha <[EMAIL PROTECTED]> wrote:

> Nothing does, even memcpy doesn't check alignment of the source, or
> alignment at all in some assembly implementations (only word-copy,
> without checking if at word-boundary).

An out-of-line implementation can only do that if the architecture
allows unaligned loads and stores. Since it has no clue about the types
involved, it must assume that both pointers as well as the length may be
misaligned.

gcc, on the other hand, knows exactly what types are involved, so when
it expands its own builtin-memcpy inline it can optimize it based on
the required alignment of those types. So when you cast between types
with different alignment requirements, you must make sure the result is
properly aligned, or you need to use get_unaligned()/put_unaligned()
to override gcc's assumptions.

Btw, some versions of avr32-gcc (I think it was 4.0.x) assumed packed
structs were properly aligned too, with disastrous results. gcc-4.1
handles packed structs correctly as far as I can tell.

Håvard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-24 Thread Pierre Ossman
On Sat, 24 Nov 2007 17:22:36 +
Luciano Rocha <[EMAIL PROTECTED]> wrote:

> On Sat, Nov 24, 2007 at 05:19:31PM +0100, Pierre Ossman wrote:
> > It most certainly does not. gcc will assume that an int* has int alignment. 
> > memcpy() is a builtin, which gcc can translate to pretty much anything. And 
> > C specifies that a pointer to foo, will point to a real object of type foo, 
> > so gcc can't be blamed for the unsafe typecasts. I have tested this the 
> > hard way, so this is not just speculation.
> 
> Yes, on *int and other assumed aligned pointers, gcc uses its internal
> version.
> 
> However, my point is that those pointers, unless speaking of packed
> structures, can safely be assumed aligned, while char*/void* can't.
> 

I get the sensation we're violently in agreement here, just misunderstanding 
each other. :)

_My_ point was that the documentation should mention that normal, unpacked C 
objects have alignments that influence the code generated by 
__builtin_memcpy(). As such, one should always make sure to have either src or 
dst be char*/void* when alignment cannot be guaranteed. The example in the 
documentation has this, but it isn't explicit that this is required.

Rgds
-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-24 Thread Luciano Rocha
On Sat, Nov 24, 2007 at 05:19:31PM +0100, Pierre Ossman wrote:
> On Sat, 24 Nov 2007 15:50:52 +
> Luciano Rocha <[EMAIL PROTECTED]> wrote:
> 
> > 
> > Dumb memcpy (while (len--) { *d++ = *s++ }) will have alignment problems
> > in any case. Intelligent ones, like the one provided in glibc, first copy
> > bytes till output is aligned (C file) *or* size is a multiple (i686 asm 
> > file)
> > of word size, and then it copies word-by-word.
> > 
> > Linux's x86_64 memcpy does the opposite, copies 64bit words, and then
> > copies the last bytes.
> > 
> > So, in effect, as long as no packed structures are used, memcpy should
> > be safer on *int, etc., than *char, as the compiler ensures
> > word-alignment.
> > 
> 
> It most certainly does not. gcc will assume that an int* has int alignment. 
> memcpy() is a builtin, which gcc can translate to pretty much anything. And C 
> specifies that a pointer to foo, will point to a real object of type foo, so 
> gcc can't be blamed for the unsafe typecasts. I have tested this the hard 
> way, so this is not just speculation.

Yes, on *int and other assumed aligned pointers, gcc uses its internal
version.

However, my point is that those pointers, unless speaking of packed
structures, can safely be assumed aligned, while char*/void* can't.

> In other words, memcpy() does _not_ save you from alignment issues. If you 
> cast from char* or void* to something else, you better be damn sure the 
> alignment is correct because gcc will assume it is.

Nothing does, even memcpy doesn't check alignment of the source, or
alignment at all in some assembly implementations (only word-copy,
without checking if at word-boundary).

-- 
lfr
0/0


pgpSqyJvQFOo9.pgp
Description: PGP signature


Re: [RFC] Documentation about unaligned memory access

2007-11-24 Thread Pierre Ossman
On Sat, 24 Nov 2007 15:50:52 +
Luciano Rocha <[EMAIL PROTECTED]> wrote:

> 
> Dumb memcpy (while (len--) { *d++ = *s++ }) will have alignment problems
> in any case. Intelligent ones, like the one provided in glibc, first copy
> bytes till output is aligned (C file) *or* size is a multiple (i686 asm file)
> of word size, and then it copies word-by-word.
> 
> Linux's x86_64 memcpy does the opposite, copies 64bit words, and then
> copies the last bytes.
> 
> So, in effect, as long as no packed structures are used, memcpy should
> be safer on *int, etc., than *char, as the compiler ensures
> word-alignment.
> 

It most certainly does not. gcc will assume that an int* has int alignment. 
memcpy() is a builtin, which gcc can translate to pretty much anything. And C 
specifies that a pointer to foo, will point to a real object of type foo, so 
gcc can't be blamed for the unsafe typecasts. I have tested this the hard way, 
so this is not just speculation.

E.g., we have the following struct:

struct foo
{
u8 a[4];
u32 b;
};

This struct will have a size of 8 bytes and an alignment of 4 bytes (caused by 
the member b). Now take the following code:

void copy_foo(struct foo *dst, struct foo *src)
{
*dst = *src;
}

On a platform that supports 64-bit loads and stores (e.g. AVR32, where I got 
hit by this), this will generate:

LD r1, (src)
ST r1, (dst)

Now if I replace that with:

void copy_foo(struct foo *dst, struct foo *src)
{
memcpy(dst, src, sizeof(struct foo));
}

then it will generate the same code. So I cannot use copy_foo() to transfer a 
struct foo either out of, or into a packet buffer.

In other words, memcpy() does _not_ save you from alignment issues. If you cast 
from char* or void* to something else, you better be damn sure the alignment is 
correct because gcc will assume it is.

Rgds
-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-24 Thread Luciano Rocha
On Sat, Nov 24, 2007 at 02:34:41PM +0100, Pierre Ossman wrote:
> On Fri, 23 Nov 2007 00:15:53 + (GMT)
> Daniel Drake <[EMAIL PROTECTED]> wrote:
> 
> > Being spoilt by the luxuries of i386/x86_64 I've never really had a good
> > grasp on unaligned memory access problems on other architectures and decided
> > it was time to figure it out. As a result I've written this documentation
> > which I plan to submit for inclusion as
> > Documentation/unaligned_memory_access.txt
> > 
> > Before I do so, any comments on the following?
> > 
> 
> A very nice, and much needed document. I think you should include one thing 
> though:
> 
> memcpy() is _only_ safe when one of the pointers is char* or void*. If it is 
> anything more complex than that, gcc will assume alignment and optimise based 
> on that. E.g. memcpy() of two long:s generates the same assembly as doing an 
> assignment.

Dumb memcpy (while (len--) { *d++ = *s++ }) will have alignment problems
in any case. Intelligent ones, like the one provided in glibc, first copy
bytes till output is aligned (C file) *or* size is a multiple (i686 asm file)
of word size, and then it copies word-by-word.

Linux's x86_64 memcpy does the opposite, copies 64bit words, and then
copies the last bytes.

So, in effect, as long as no packed structures are used, memcpy should
be safer on *int, etc., than *char, as the compiler ensures
word-alignment.

-- 
lfr
0/0


pgpQa3znDcMST.pgp
Description: PGP signature


Re: [RFC] Documentation about unaligned memory access

2007-11-24 Thread Pierre Ossman
On Fri, 23 Nov 2007 00:15:53 + (GMT)
Daniel Drake <[EMAIL PROTECTED]> wrote:

> Being spoilt by the luxuries of i386/x86_64 I've never really had a good
> grasp on unaligned memory access problems on other architectures and decided
> it was time to figure it out. As a result I've written this documentation
> which I plan to submit for inclusion as
> Documentation/unaligned_memory_access.txt
> 
> Before I do so, any comments on the following?
> 

A very nice, and much needed document. I think you should include one thing 
though:

memcpy() is _only_ safe when one of the pointers is char* or void*. If it is 
anything more complex than that, gcc will assume alignment and optimise based 
on that. E.g. memcpy() of two long:s generates the same assembly as doing an 
assignment.

(Technically it is no different for char* and void*, but since they have byte 
alignment, gcc can't really do anything creative.)

Rgds
-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-24 Thread Pierre Ossman
On Fri, 23 Nov 2007 00:15:53 + (GMT)
Daniel Drake [EMAIL PROTECTED] wrote:

 Being spoilt by the luxuries of i386/x86_64 I've never really had a good
 grasp on unaligned memory access problems on other architectures and decided
 it was time to figure it out. As a result I've written this documentation
 which I plan to submit for inclusion as
 Documentation/unaligned_memory_access.txt
 
 Before I do so, any comments on the following?
 

A very nice, and much needed document. I think you should include one thing 
though:

memcpy() is _only_ safe when one of the pointers is char* or void*. If it is 
anything more complex than that, gcc will assume alignment and optimise based 
on that. E.g. memcpy() of two long:s generates the same assembly as doing an 
assignment.

(Technically it is no different for char* and void*, but since they have byte 
alignment, gcc can't really do anything creative.)

Rgds
-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-24 Thread Luciano Rocha
On Sat, Nov 24, 2007 at 02:34:41PM +0100, Pierre Ossman wrote:
 On Fri, 23 Nov 2007 00:15:53 + (GMT)
 Daniel Drake [EMAIL PROTECTED] wrote:
 
  Being spoilt by the luxuries of i386/x86_64 I've never really had a good
  grasp on unaligned memory access problems on other architectures and decided
  it was time to figure it out. As a result I've written this documentation
  which I plan to submit for inclusion as
  Documentation/unaligned_memory_access.txt
  
  Before I do so, any comments on the following?
  
 
 A very nice, and much needed document. I think you should include one thing 
 though:
 
 memcpy() is _only_ safe when one of the pointers is char* or void*. If it is 
 anything more complex than that, gcc will assume alignment and optimise based 
 on that. E.g. memcpy() of two long:s generates the same assembly as doing an 
 assignment.

Dumb memcpy (while (len--) { *d++ = *s++ }) will have alignment problems
in any case. Intelligent ones, like the one provided in glibc, first copy
bytes till output is aligned (C file) *or* size is a multiple (i686 asm file)
of word size, and then it copies word-by-word.

Linux's x86_64 memcpy does the opposite, copies 64bit words, and then
copies the last bytes.

So, in effect, as long as no packed structures are used, memcpy should
be safer on *int, etc., than *char, as the compiler ensures
word-alignment.

-- 
lfr
0/0


pgpQa3znDcMST.pgp
Description: PGP signature


Re: [RFC] Documentation about unaligned memory access

2007-11-24 Thread Pierre Ossman
On Sat, 24 Nov 2007 15:50:52 +
Luciano Rocha [EMAIL PROTECTED] wrote:

 
 Dumb memcpy (while (len--) { *d++ = *s++ }) will have alignment problems
 in any case. Intelligent ones, like the one provided in glibc, first copy
 bytes till output is aligned (C file) *or* size is a multiple (i686 asm file)
 of word size, and then it copies word-by-word.
 
 Linux's x86_64 memcpy does the opposite, copies 64bit words, and then
 copies the last bytes.
 
 So, in effect, as long as no packed structures are used, memcpy should
 be safer on *int, etc., than *char, as the compiler ensures
 word-alignment.
 

It most certainly does not. gcc will assume that an int* has int alignment. 
memcpy() is a builtin, which gcc can translate to pretty much anything. And C 
specifies that a pointer to foo, will point to a real object of type foo, so 
gcc can't be blamed for the unsafe typecasts. I have tested this the hard way, 
so this is not just speculation.

E.g., we have the following struct:

struct foo
{
u8 a[4];
u32 b;
};

This struct will have a size of 8 bytes and an alignment of 4 bytes (caused by 
the member b). Now take the following code:

void copy_foo(struct foo *dst, struct foo *src)
{
*dst = *src;
}

On a platform that supports 64-bit loads and stores (e.g. AVR32, where I got 
hit by this), this will generate:

LD r1, (src)
ST r1, (dst)

Now if I replace that with:

void copy_foo(struct foo *dst, struct foo *src)
{
memcpy(dst, src, sizeof(struct foo));
}

then it will generate the same code. So I cannot use copy_foo() to transfer a 
struct foo either out of, or into a packet buffer.

In other words, memcpy() does _not_ save you from alignment issues. If you cast 
from char* or void* to something else, you better be damn sure the alignment is 
correct because gcc will assume it is.

Rgds
-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-24 Thread Luciano Rocha
On Sat, Nov 24, 2007 at 05:19:31PM +0100, Pierre Ossman wrote:
 On Sat, 24 Nov 2007 15:50:52 +
 Luciano Rocha [EMAIL PROTECTED] wrote:
 
  
  Dumb memcpy (while (len--) { *d++ = *s++ }) will have alignment problems
  in any case. Intelligent ones, like the one provided in glibc, first copy
  bytes till output is aligned (C file) *or* size is a multiple (i686 asm 
  file)
  of word size, and then it copies word-by-word.
  
  Linux's x86_64 memcpy does the opposite, copies 64bit words, and then
  copies the last bytes.
  
  So, in effect, as long as no packed structures are used, memcpy should
  be safer on *int, etc., than *char, as the compiler ensures
  word-alignment.
  
 
 It most certainly does not. gcc will assume that an int* has int alignment. 
 memcpy() is a builtin, which gcc can translate to pretty much anything. And C 
 specifies that a pointer to foo, will point to a real object of type foo, so 
 gcc can't be blamed for the unsafe typecasts. I have tested this the hard 
 way, so this is not just speculation.

Yes, on *int and other assumed aligned pointers, gcc uses its internal
version.

However, my point is that those pointers, unless speaking of packed
structures, can safely be assumed aligned, while char*/void* can't.

 In other words, memcpy() does _not_ save you from alignment issues. If you 
 cast from char* or void* to something else, you better be damn sure the 
 alignment is correct because gcc will assume it is.

Nothing does, even memcpy doesn't check alignment of the source, or
alignment at all in some assembly implementations (only word-copy,
without checking if at word-boundary).

-- 
lfr
0/0


pgpSqyJvQFOo9.pgp
Description: PGP signature


Re: [RFC] Documentation about unaligned memory access

2007-11-24 Thread Pierre Ossman
On Sat, 24 Nov 2007 17:22:36 +
Luciano Rocha [EMAIL PROTECTED] wrote:

 On Sat, Nov 24, 2007 at 05:19:31PM +0100, Pierre Ossman wrote:
  It most certainly does not. gcc will assume that an int* has int alignment. 
  memcpy() is a builtin, which gcc can translate to pretty much anything. And 
  C specifies that a pointer to foo, will point to a real object of type foo, 
  so gcc can't be blamed for the unsafe typecasts. I have tested this the 
  hard way, so this is not just speculation.
 
 Yes, on *int and other assumed aligned pointers, gcc uses its internal
 version.
 
 However, my point is that those pointers, unless speaking of packed
 structures, can safely be assumed aligned, while char*/void* can't.
 

I get the sensation we're violently in agreement here, just misunderstanding 
each other. :)

_My_ point was that the documentation should mention that normal, unpacked C 
objects have alignments that influence the code generated by 
__builtin_memcpy(). As such, one should always make sure to have either src or 
dst be char*/void* when alignment cannot be guaranteed. The example in the 
documentation has this, but it isn't explicit that this is required.

Rgds
-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-24 Thread Haavard Skinnemoen
On Sat, 24 Nov 2007 17:22:36 +
Luciano Rocha [EMAIL PROTECTED] wrote:

 Nothing does, even memcpy doesn't check alignment of the source, or
 alignment at all in some assembly implementations (only word-copy,
 without checking if at word-boundary).

An out-of-line implementation can only do that if the architecture
allows unaligned loads and stores. Since it has no clue about the types
involved, it must assume that both pointers as well as the length may be
misaligned.

gcc, on the other hand, knows exactly what types are involved, so when
it expands its own builtin-memcpy inline it can optimize it based on
the required alignment of those types. So when you cast between types
with different alignment requirements, you must make sure the result is
properly aligned, or you need to use get_unaligned()/put_unaligned()
to override gcc's assumptions.

Btw, some versions of avr32-gcc (I think it was 4.0.x) assumed packed
structs were properly aligned too, with disastrous results. gcc-4.1
handles packed structs correctly as far as I can tell.

Håvard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-24 Thread Luciano Rocha
On Sat, Nov 24, 2007 at 06:35:25PM +0100, Pierre Ossman wrote:
 On Sat, 24 Nov 2007 17:22:36 +
 Luciano Rocha [EMAIL PROTECTED] wrote:
 
  On Sat, Nov 24, 2007 at 05:19:31PM +0100, Pierre Ossman wrote:
   It most certainly does not. gcc will assume that an int* has int 
   alignment. memcpy() is a builtin, which gcc can translate to pretty much 
   anything. And C specifies that a pointer to foo, will point to a real 
   object of type foo, so gcc can't be blamed for the unsafe typecasts. I 
   have tested this the hard way, so this is not just speculation.
  
  Yes, on *int and other assumed aligned pointers, gcc uses its internal
  version.
  
  However, my point is that those pointers, unless speaking of packed
  structures, can safely be assumed aligned, while char*/void* can't.
  
 
 I get the sensation we're violently in agreement here, just misunderstanding 
 each other. :)

That's it. :)

Sorry for the noise,...

-- 
lfr
0/0


pgprb39HuMXhL.pgp
Description: PGP signature


Re: [RFC] Documentation about unaligned memory access

2007-11-23 Thread Dmitri Vorobiev
Daniel Drake пишет:
> Being spoilt by the luxuries of i386/x86_64 I've never really had a good
> grasp on unaligned memory access problems on other architectures and decided
> it was time to figure it out. As a result I've written this documentation
> which I plan to submit for inclusion as
> Documentation/unaligned_memory_access.txt
> 
> Before I do so, any comments on the following?

>From the viewpoint of yours truly (and I am a teacher of operating system 
>classes), this is a long-expected document, which is going to be very useful 
>especially for newbies. My students often make alignment mistakes in their 
>code, and your article will definitely make my job much easier.

Thank you, Daniel, for your work.

Dmitri

> 
> Thanks,
> Daniel
> 
> 
> 
> 
> UNALIGNED MEMORY ACCESSES
> =
> 
> Linux runs on a wide variety of architectures which have varying behaviour
> when it comes to memory access. This document presents some details about
> unaligned accesses, why you need to write code that doesn't cause them,
> and how to write such code!
> 
> 
> What's the definition of an unaligned access?
> =
> 
> Unaligned memory accesses occur when you try to read N bytes of data starting
> from an address that is not evenly divisible by N (i.e. addr % N != 0).
> For example, reading 4 bytes of data from address 0x1004 is fine, but
> reading 4 bytes of data from address 0x1005 would be an unaligned memory
> access.
> 
> 
> Why unaligned access is bad
> ===
> 
> Most architectures are unable to perform unaligned memory accesses. Any
> unaligned access causes a processor exception.
> 
> Some architectures have an exception handler implemented in the kernel which
> corrects the memory access, but this is very expensive and is not true for
> all architectures. You cannot rely on the exception handler to correct your
> memory accesses.
> 
> In summary: if your code causes unaligned memory accesses to happen, your code
> will not work on some platforms, and will perform *very* badly on others.
> 
> You may be wondering why you have never seen these problems on your own
> architecture. Some architectures (such as i386 and x86_64) do not have this
> limitation, but nevertheless it is important for you to write portable code
> that works everywhere.
> 
> 
> Natural alignment
> =
> 
> The rule we mentioned earlier forms what we refer to as natural alignment:
> When accessing N bytes of memory, the base memory address must be evenly
> divisible by N, i.e. addr % N == 0
> 
> When writing code, assume the target architecture has natural alignment
> requirements.
> 
> Sidenote: in reality, only a few architectures require natural alignment
> on all sizes of memory access. However, again we must consider ALL supported
> architectures; natural alignment is the only way to achieve full portability.
> 
> 
> Code that doesn't cause unaligned access
> 
> 
> At first, the concepts above may seem a little hard to relate to actual
> coding practice. After all, you don't have a great deal of control over
> memory addresses of certain variables, etc.
> 
> Fortunately things are not too complex, as in most cases, the compiler
> ensures that things will work for you. For example, take the following
> structure:
> 
>   struct foo {
>   u16 field1;
>   u32 field2;
>   u8 field3;
>   };
> 
> Let us assume that an instance of the above structure resides in memory
> starting at address 0x1000. With a basic level of understanding, it would
> not be unreasonable to expect that accessing field2 would cause an unaligned
> access. You'd be expecting field2 to be located at offset 2 bytes into the
> structure, i.e. address 0x1002, but that address is not evenly divisible
> by 4 (remember, we're reading a 4 byte value here).
> 
> Fortunately, the compiler understands the alignment constraints, so in the
> above case it would insert 2 bytes of padding inbetween field1 and field2.
> Therefore, for standard structure types you can always rely on the compiler
> to pad structures so that accesses to fields are suitably aligned (assuming
> you do not cast the field to a type of different length).
> 
> Similarly, you can also rely on the compiler to align variables and function
> parameters to a naturally aligned scheme, based on the size of the type of
> the variable.
> 
> Sidenote: in the above example, you may wish to reorder the fields in the
> above structure so that the overall structure uses less memory. For example,
> moving field3 to sit inbetween field1 and field2 (where the padding is
> inserted) would shrink the overall structure by 1 byte:
> 
>   struct foo {
>   u16 field1;
>   u8 field3;
>   u32 field2;
>   };
> 
> Sidenote: it should be obvious by now, but in case it is not, accessing a
> single 

Re: [RFC] Documentation about unaligned memory access

2007-11-23 Thread Vadim Lobanov
On Thursday 22 November 2007 04:15:53 pm Daniel Drake wrote:
> Fortunately things are not too complex, as in most cases, the compiler
> ensures that things will work for you. For example, take the following
> structure:
>
>   struct foo {
>   u16 field1;
>   u32 field2;
>   u8 field3;
>   };
>
> Fortunately, the compiler understands the alignment constraints, so in the
> above case it would insert 2 bytes of padding inbetween field1 and field2.
> Therefore, for standard structure types you can always rely on the compiler
> to pad structures so that accesses to fields are suitably aligned (assuming
> you do not cast the field to a type of different length).

It would also insert 3 bytes of padding after field3, in order to satisfy 
alignment constraints for arrays of these structures.

> Sidenote: in the above example, you may wish to reorder the fields in the
> above structure so that the overall structure uses less memory. For
> example, moving field3 to sit inbetween field1 and field2 (where the
> padding is inserted) would shrink the overall structure by 1 byte:
>
>   struct foo {
>   u16 field1;
>   u8 field3;
>   u32 field2;
>   };

It will actually shrink it by 4 bytes, for the very same reason.

-- Vadim Lobanov


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-23 Thread Heikki Orsila
On Fri, Nov 23, 2007 at 12:15:53AM +, Daniel Drake wrote:
> Why unaligned access is bad
> ===
> 
> Most architectures are unable to perform unaligned memory accesses. Any
> unaligned access causes a processor exception.

"Some architectures are unable to perform unaligned memory accesses, 
either an exception is generated, or the data 
access is silently invalid. In architectures that allow unaligned 
access, natural aligned accesses are usually faster than non-aligned."

> In summary: if your code causes unaligned memory accesses to happen, your code
> will not work on some platforms, and will perform *very* badly on others.

*very* -> *slower*

> Natural alignment
> =

Please move this definition before "Why unaligned access is bad".

Also, it would be nice to have a table of ISAs:

ISA NeedNeed
natural alignment
alignment   by x

m68kNo  2
powerpc/ppc Yes Word size
x86 No  No
x86_64  No  No

-- 
Heikki Orsila   Barbie's law:
[EMAIL PROTECTED]   "Math is hard, let's go shopping!"
http://www.iki.fi/shd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-23 Thread Jan Engelhardt

On Nov 23 2007 00:15, Daniel Drake wrote:
>
>What's the definition of an unaligned access?
>=
>
>Unaligned memory accesses occur when you try to read N bytes of data starting
>from an address that is not evenly divisible by N (i.e. addr % N != 0).
>For example, reading 4 bytes of data from address 0x1004 is fine, but
>reading 4 bytes of data from address 0x1005 would be an unaligned memory
>access.
>
Try shorter numbers, like 0x10005 :)


>Code that doesn't cause unaligned access
>

In written style, not using n't contracted forms might be preferable.


>Sidenote: in the above example, you may wish to reorder the fields in the
>above structure so that the overall structure uses less memory. For example,
>moving field3 to sit inbetween field1 and field2 (where the padding is
>inserted) would shrink the overall structure by 1 byte:
>
>   struct foo {
>   u16 field1;
>   u8 field3;
>   u32 field2;
>   };
>
>Sidenote: it should be obvious by now, but in case it is not, accessing a
>single byte (u8 or char) can never cause an unaligned access, because all
>memory addresses are evenly divisible by 1.

Sidenote: You would want an alignment like this:

struct foo {
uint32_t field2;
uint16_t field1;
uint8_t field3;
};


>Consider the following structure:
>   struct foo {
>   u16 field1;
>   u32 field2;
>   u8 field3;
>   } __attribute__((packed));
>
>It's the same structure as we looked at earlier, but the packed attribute has
>been added. This attribute ensures that the compiler never inserts any padding
>and the structure is laid out in memory exactly as is suggested above.
>
>The packed attribute is useful when you want to use a C struct to represent
>some data that comes in a fixed arrangement 'off the wire'.
>
In the packed case, does not GCC automatically output extra instructions to not
run into unaligned access?

>To avoid the unaligned memory access, you could rewrite it as follows:
>
>   void myfunc(u8 *data, u32 value)
>   {
>   [...]
>   value = cpu_to_le32(value);
>   memcpy(data, value, sizeof(value));
>   [...]
>   }
>
>It's safe to assume that memcpy will always copy bytewise and hence will
>never cause an unaligned access.
>
Usually it copies register-size-wise where possible and bytesize at the
left and right edges if they are unaligned. That's how glibc memcpy does it,
not sure how complete the kernel memcpy is in this regard.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-23 Thread Arne Georg Gleditsch
dean gaudet <[EMAIL PROTECTED]> writes:
> on AMD x86 pre-family 10h the boundary is 8 bytes, and on fam 10h it's 16 
> bytes.  the penalty is a mere 3 cycles if an access crosses the specified 
> boundary.

Worth noting though, is that atomic accesses that cross cache lines on
an Opteron system is going to lock down the Hypertransport fabric for
you during the operation -- which is obviously not so nice.

-- 
Arne.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-23 Thread Vadim Lobanov
On Thursday 22 November 2007 04:15:53 pm Daniel Drake wrote:
 Fortunately things are not too complex, as in most cases, the compiler
 ensures that things will work for you. For example, take the following
 structure:

   struct foo {
   u16 field1;
   u32 field2;
   u8 field3;
   };

 Fortunately, the compiler understands the alignment constraints, so in the
 above case it would insert 2 bytes of padding inbetween field1 and field2.
 Therefore, for standard structure types you can always rely on the compiler
 to pad structures so that accesses to fields are suitably aligned (assuming
 you do not cast the field to a type of different length).

It would also insert 3 bytes of padding after field3, in order to satisfy 
alignment constraints for arrays of these structures.

 Sidenote: in the above example, you may wish to reorder the fields in the
 above structure so that the overall structure uses less memory. For
 example, moving field3 to sit inbetween field1 and field2 (where the
 padding is inserted) would shrink the overall structure by 1 byte:

   struct foo {
   u16 field1;
   u8 field3;
   u32 field2;
   };

It will actually shrink it by 4 bytes, for the very same reason.

-- Vadim Lobanov


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-23 Thread Dmitri Vorobiev
Daniel Drake пишет:
 Being spoilt by the luxuries of i386/x86_64 I've never really had a good
 grasp on unaligned memory access problems on other architectures and decided
 it was time to figure it out. As a result I've written this documentation
 which I plan to submit for inclusion as
 Documentation/unaligned_memory_access.txt
 
 Before I do so, any comments on the following?

From the viewpoint of yours truly (and I am a teacher of operating system 
classes), this is a long-expected document, which is going to be very useful 
especially for newbies. My students often make alignment mistakes in their 
code, and your article will definitely make my job much easier.

Thank you, Daniel, for your work.

Dmitri

 
 Thanks,
 Daniel
 
 
 
 
 UNALIGNED MEMORY ACCESSES
 =
 
 Linux runs on a wide variety of architectures which have varying behaviour
 when it comes to memory access. This document presents some details about
 unaligned accesses, why you need to write code that doesn't cause them,
 and how to write such code!
 
 
 What's the definition of an unaligned access?
 =
 
 Unaligned memory accesses occur when you try to read N bytes of data starting
 from an address that is not evenly divisible by N (i.e. addr % N != 0).
 For example, reading 4 bytes of data from address 0x1004 is fine, but
 reading 4 bytes of data from address 0x1005 would be an unaligned memory
 access.
 
 
 Why unaligned access is bad
 ===
 
 Most architectures are unable to perform unaligned memory accesses. Any
 unaligned access causes a processor exception.
 
 Some architectures have an exception handler implemented in the kernel which
 corrects the memory access, but this is very expensive and is not true for
 all architectures. You cannot rely on the exception handler to correct your
 memory accesses.
 
 In summary: if your code causes unaligned memory accesses to happen, your code
 will not work on some platforms, and will perform *very* badly on others.
 
 You may be wondering why you have never seen these problems on your own
 architecture. Some architectures (such as i386 and x86_64) do not have this
 limitation, but nevertheless it is important for you to write portable code
 that works everywhere.
 
 
 Natural alignment
 =
 
 The rule we mentioned earlier forms what we refer to as natural alignment:
 When accessing N bytes of memory, the base memory address must be evenly
 divisible by N, i.e. addr % N == 0
 
 When writing code, assume the target architecture has natural alignment
 requirements.
 
 Sidenote: in reality, only a few architectures require natural alignment
 on all sizes of memory access. However, again we must consider ALL supported
 architectures; natural alignment is the only way to achieve full portability.
 
 
 Code that doesn't cause unaligned access
 
 
 At first, the concepts above may seem a little hard to relate to actual
 coding practice. After all, you don't have a great deal of control over
 memory addresses of certain variables, etc.
 
 Fortunately things are not too complex, as in most cases, the compiler
 ensures that things will work for you. For example, take the following
 structure:
 
   struct foo {
   u16 field1;
   u32 field2;
   u8 field3;
   };
 
 Let us assume that an instance of the above structure resides in memory
 starting at address 0x1000. With a basic level of understanding, it would
 not be unreasonable to expect that accessing field2 would cause an unaligned
 access. You'd be expecting field2 to be located at offset 2 bytes into the
 structure, i.e. address 0x1002, but that address is not evenly divisible
 by 4 (remember, we're reading a 4 byte value here).
 
 Fortunately, the compiler understands the alignment constraints, so in the
 above case it would insert 2 bytes of padding inbetween field1 and field2.
 Therefore, for standard structure types you can always rely on the compiler
 to pad structures so that accesses to fields are suitably aligned (assuming
 you do not cast the field to a type of different length).
 
 Similarly, you can also rely on the compiler to align variables and function
 parameters to a naturally aligned scheme, based on the size of the type of
 the variable.
 
 Sidenote: in the above example, you may wish to reorder the fields in the
 above structure so that the overall structure uses less memory. For example,
 moving field3 to sit inbetween field1 and field2 (where the padding is
 inserted) would shrink the overall structure by 1 byte:
 
   struct foo {
   u16 field1;
   u8 field3;
   u32 field2;
   };
 
 Sidenote: it should be obvious by now, but in case it is not, accessing a
 single byte (u8 or char) can never cause an unaligned access, because all
 memory addresses are evenly divisible by 1.
 
 
 Code 

Re: [RFC] Documentation about unaligned memory access

2007-11-23 Thread Jan Engelhardt

On Nov 23 2007 00:15, Daniel Drake wrote:

What's the definition of an unaligned access?
=

Unaligned memory accesses occur when you try to read N bytes of data starting
from an address that is not evenly divisible by N (i.e. addr % N != 0).
For example, reading 4 bytes of data from address 0x1004 is fine, but
reading 4 bytes of data from address 0x1005 would be an unaligned memory
access.

Try shorter numbers, like 0x10005 :)


Code that doesn't cause unaligned access


In written style, not using n't contracted forms might be preferable.


Sidenote: in the above example, you may wish to reorder the fields in the
above structure so that the overall structure uses less memory. For example,
moving field3 to sit inbetween field1 and field2 (where the padding is
inserted) would shrink the overall structure by 1 byte:

   struct foo {
   u16 field1;
   u8 field3;
   u32 field2;
   };

Sidenote: it should be obvious by now, but in case it is not, accessing a
single byte (u8 or char) can never cause an unaligned access, because all
memory addresses are evenly divisible by 1.

Sidenote: You would want an alignment like this:

struct foo {
uint32_t field2;
uint16_t field1;
uint8_t field3;
};


Consider the following structure:
   struct foo {
   u16 field1;
   u32 field2;
   u8 field3;
   } __attribute__((packed));

It's the same structure as we looked at earlier, but the packed attribute has
been added. This attribute ensures that the compiler never inserts any padding
and the structure is laid out in memory exactly as is suggested above.

The packed attribute is useful when you want to use a C struct to represent
some data that comes in a fixed arrangement 'off the wire'.

In the packed case, does not GCC automatically output extra instructions to not
run into unaligned access?

To avoid the unaligned memory access, you could rewrite it as follows:

   void myfunc(u8 *data, u32 value)
   {
   [...]
   value = cpu_to_le32(value);
   memcpy(data, value, sizeof(value));
   [...]
   }

It's safe to assume that memcpy will always copy bytewise and hence will
never cause an unaligned access.

Usually it copies register-size-wise where possible and bytesize at the
left and right edges if they are unaligned. That's how glibc memcpy does it,
not sure how complete the kernel memcpy is in this regard.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-23 Thread Arne Georg Gleditsch
dean gaudet [EMAIL PROTECTED] writes:
 on AMD x86 pre-family 10h the boundary is 8 bytes, and on fam 10h it's 16 
 bytes.  the penalty is a mere 3 cycles if an access crosses the specified 
 boundary.

Worth noting though, is that atomic accesses that cross cache lines on
an Opteron system is going to lock down the Hypertransport fabric for
you during the operation -- which is obviously not so nice.

-- 
Arne.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-23 Thread Heikki Orsila
On Fri, Nov 23, 2007 at 12:15:53AM +, Daniel Drake wrote:
 Why unaligned access is bad
 ===
 
 Most architectures are unable to perform unaligned memory accesses. Any
 unaligned access causes a processor exception.

Some architectures are unable to perform unaligned memory accesses, 
either an exception is generated, or the data 
access is silently invalid. In architectures that allow unaligned 
access, natural aligned accesses are usually faster than non-aligned.

 In summary: if your code causes unaligned memory accesses to happen, your code
 will not work on some platforms, and will perform *very* badly on others.

*very* - *slower*

 Natural alignment
 =

Please move this definition before Why unaligned access is bad.

Also, it would be nice to have a table of ISAs:

ISA NeedNeed
natural alignment
alignment   by x

m68kNo  2
powerpc/ppc Yes Word size
x86 No  No
x86_64  No  No

-- 
Heikki Orsila   Barbie's law:
[EMAIL PROTECTED]   Math is hard, let's go shopping!
http://www.iki.fi/shd
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-22 Thread dean gaudet
On Fri, 23 Nov 2007, Alan Cox wrote:

> Its usually faster if you don't misalign on x86 as well.

i'm not sure if i agree with "usually"... but i know you (alan) are 
probably aware of the exact requirements of the hw.

for everyone else:

on intel x86 processors an access is unaligned only if it crosses a 
cacheline boundary (64 bytes).  otherwise it's aligned.  the penalty for 
crossing a cacheline boundary varies from ~12 cycles (core2) to many 
dozens of cycles (p4).

on AMD x86 pre-family 10h the boundary is 8 bytes, and on fam 10h it's 16 
bytes.  the penalty is a mere 3 cycles if an access crosses the specified 
boundary.

if you're making <= 4 byte accesses i recommend not worrying about 
alignment on x86.  it's pretty hard to beat the hardware support.

i curse all the RISC and embedded processor designers who pretend 
unaligned accesses are something evil and to be avoided.  in case you're 
worried, MIPS patent 4,814,976 expired in december 2006 :)

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-22 Thread Kyle Moffett

On Nov 22, 2007, at 20:29:11, Alan Cox wrote:
Most architectures are unable to perform unaligned memory  
accesses. Any unaligned access causes a processor exception.


Not all. Some simply produce the wrong answer - thats oh so much  
more exciting.


As one example, the MicroBlaze soft-core processor family designed  
for use on Xilinx FPGAs will (by default) simply forcibly zero the  
lower bits of the unaligned address, such that the following code  
will fail mysteriously:


const char foo[] = { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07 };
printf("0x%08lx 0x%08lx 0x%08lx 0x%08lx\n",
*((u32 *)(foo+0)),
*((u32 *)(foo+1)),
*((u32 *)(foo+2)),
*((u32 *)(foo+3)));

Instead of outputting:
0x00010203 0x01020304 0x02030405 0x03040506

It will output:
0x00010203 0x00010203 0x00010203 0x00010203

Other embedded architectures have very similar problems.  Some may  
provide an "unaligned data access" exception, but offer insufficient  
information to repair the damage and resume execution.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-22 Thread Andi Kleen
Robert Hancock <[EMAIL PROTECTED]> writes:
>
> Also, x86 doesn't prohibit unaligned accesses,

That depends, e.g. for SSE2 they can be forbidden.

> but I believe they have
> a significant performance cost and are best avoided where possible.

On Opteron the typical cost of a misaligned access is a single cycle
and some possible penalty to load-store forwarding.

On Intel it is a bit worse, but not all that much. Unless you do 
a lot of accesses of it in a loop it's not really worth something
caring about too much.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-22 Thread Alan Cox
> Most architectures are unable to perform unaligned memory accesses. Any
> unaligned access causes a processor exception.

Not all. Some simply produce the wrong answer - thats oh so much more
exciting.

> You may be wondering why you have never seen these problems on your own
> architecture. Some architectures (such as i386 and x86_64) do not have this
> limitation, but nevertheless it is important for you to write portable code
> that works everywhere.

Its usually faster if you don't misalign on x86 as well.

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-22 Thread David Miller

Thanks you for working proactively on these problems.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-22 Thread Robert Hancock

Daniel Drake wrote:

Being spoilt by the luxuries of i386/x86_64 I've never really had a good
grasp on unaligned memory access problems on other architectures and decided
it was time to figure it out. As a result I've written this documentation
which I plan to submit for inclusion as
Documentation/unaligned_memory_access.txt

Before I do so, any comments on the following?


...


You may be wondering why you have never seen these problems on your own
architecture. Some architectures (such as i386 and x86_64) do not have this
limitation, but nevertheless it is important for you to write portable code
that works everywhere.


Also, x86 doesn't prohibit unaligned accesses, but I believe they have a 
significant performance cost and are best avoided where possible.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-22 Thread Avuton Olrich
On Nov 22, 2007 4:15 PM, Daniel Drake <[EMAIL PROTECTED]> wrote:
> Before I do so, any comments on the following?
>

< above case it would insert 2 bytes of padding inbetween field1 and field2.
> above case it would insert 2 bytes of padding in between field1 and field2.


< moving field3 to sit inbetween field1 and field2 (where the padding is
> moving field3 to sit in between field1 and field2 (where the padding is
-- 
avuton
--
 Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] Documentation about unaligned memory access

2007-11-22 Thread Daniel Drake
Being spoilt by the luxuries of i386/x86_64 I've never really had a good
grasp on unaligned memory access problems on other architectures and decided
it was time to figure it out. As a result I've written this documentation
which I plan to submit for inclusion as
Documentation/unaligned_memory_access.txt

Before I do so, any comments on the following?

Thanks,
Daniel




UNALIGNED MEMORY ACCESSES
=

Linux runs on a wide variety of architectures which have varying behaviour
when it comes to memory access. This document presents some details about
unaligned accesses, why you need to write code that doesn't cause them,
and how to write such code!


What's the definition of an unaligned access?
=

Unaligned memory accesses occur when you try to read N bytes of data starting
from an address that is not evenly divisible by N (i.e. addr % N != 0).
For example, reading 4 bytes of data from address 0x1004 is fine, but
reading 4 bytes of data from address 0x1005 would be an unaligned memory
access.


Why unaligned access is bad
===

Most architectures are unable to perform unaligned memory accesses. Any
unaligned access causes a processor exception.

Some architectures have an exception handler implemented in the kernel which
corrects the memory access, but this is very expensive and is not true for
all architectures. You cannot rely on the exception handler to correct your
memory accesses.

In summary: if your code causes unaligned memory accesses to happen, your code
will not work on some platforms, and will perform *very* badly on others.

You may be wondering why you have never seen these problems on your own
architecture. Some architectures (such as i386 and x86_64) do not have this
limitation, but nevertheless it is important for you to write portable code
that works everywhere.


Natural alignment
=

The rule we mentioned earlier forms what we refer to as natural alignment:
When accessing N bytes of memory, the base memory address must be evenly
divisible by N, i.e. addr % N == 0

When writing code, assume the target architecture has natural alignment
requirements.

Sidenote: in reality, only a few architectures require natural alignment
on all sizes of memory access. However, again we must consider ALL supported
architectures; natural alignment is the only way to achieve full portability.


Code that doesn't cause unaligned access


At first, the concepts above may seem a little hard to relate to actual
coding practice. After all, you don't have a great deal of control over
memory addresses of certain variables, etc.

Fortunately things are not too complex, as in most cases, the compiler
ensures that things will work for you. For example, take the following
structure:

struct foo {
u16 field1;
u32 field2;
u8 field3;
};

Let us assume that an instance of the above structure resides in memory
starting at address 0x1000. With a basic level of understanding, it would
not be unreasonable to expect that accessing field2 would cause an unaligned
access. You'd be expecting field2 to be located at offset 2 bytes into the
structure, i.e. address 0x1002, but that address is not evenly divisible
by 4 (remember, we're reading a 4 byte value here).

Fortunately, the compiler understands the alignment constraints, so in the
above case it would insert 2 bytes of padding inbetween field1 and field2.
Therefore, for standard structure types you can always rely on the compiler
to pad structures so that accesses to fields are suitably aligned (assuming
you do not cast the field to a type of different length).

Similarly, you can also rely on the compiler to align variables and function
parameters to a naturally aligned scheme, based on the size of the type of
the variable.

Sidenote: in the above example, you may wish to reorder the fields in the
above structure so that the overall structure uses less memory. For example,
moving field3 to sit inbetween field1 and field2 (where the padding is
inserted) would shrink the overall structure by 1 byte:

struct foo {
u16 field1;
u8 field3;
u32 field2;
};

Sidenote: it should be obvious by now, but in case it is not, accessing a
single byte (u8 or char) can never cause an unaligned access, because all
memory addresses are evenly divisible by 1.


Code that causes unaligned access
=

With the above in mind, let's move onto a real life example of a function
that can cause an unaligned memory access. The following function adapted
from include/linux/etherdevice.h is an optimized routine to compare two
ethernet MAC addresses for equality.

unsigned int compare_ether_addr(const u8 *addr1, const u8 *addr2)
{
const u16 *a = (const u16 *) addr1;
const u16 *b = 

[RFC] Documentation about unaligned memory access

2007-11-22 Thread Daniel Drake
Being spoilt by the luxuries of i386/x86_64 I've never really had a good
grasp on unaligned memory access problems on other architectures and decided
it was time to figure it out. As a result I've written this documentation
which I plan to submit for inclusion as
Documentation/unaligned_memory_access.txt

Before I do so, any comments on the following?

Thanks,
Daniel




UNALIGNED MEMORY ACCESSES
=

Linux runs on a wide variety of architectures which have varying behaviour
when it comes to memory access. This document presents some details about
unaligned accesses, why you need to write code that doesn't cause them,
and how to write such code!


What's the definition of an unaligned access?
=

Unaligned memory accesses occur when you try to read N bytes of data starting
from an address that is not evenly divisible by N (i.e. addr % N != 0).
For example, reading 4 bytes of data from address 0x1004 is fine, but
reading 4 bytes of data from address 0x1005 would be an unaligned memory
access.


Why unaligned access is bad
===

Most architectures are unable to perform unaligned memory accesses. Any
unaligned access causes a processor exception.

Some architectures have an exception handler implemented in the kernel which
corrects the memory access, but this is very expensive and is not true for
all architectures. You cannot rely on the exception handler to correct your
memory accesses.

In summary: if your code causes unaligned memory accesses to happen, your code
will not work on some platforms, and will perform *very* badly on others.

You may be wondering why you have never seen these problems on your own
architecture. Some architectures (such as i386 and x86_64) do not have this
limitation, but nevertheless it is important for you to write portable code
that works everywhere.


Natural alignment
=

The rule we mentioned earlier forms what we refer to as natural alignment:
When accessing N bytes of memory, the base memory address must be evenly
divisible by N, i.e. addr % N == 0

When writing code, assume the target architecture has natural alignment
requirements.

Sidenote: in reality, only a few architectures require natural alignment
on all sizes of memory access. However, again we must consider ALL supported
architectures; natural alignment is the only way to achieve full portability.


Code that doesn't cause unaligned access


At first, the concepts above may seem a little hard to relate to actual
coding practice. After all, you don't have a great deal of control over
memory addresses of certain variables, etc.

Fortunately things are not too complex, as in most cases, the compiler
ensures that things will work for you. For example, take the following
structure:

struct foo {
u16 field1;
u32 field2;
u8 field3;
};

Let us assume that an instance of the above structure resides in memory
starting at address 0x1000. With a basic level of understanding, it would
not be unreasonable to expect that accessing field2 would cause an unaligned
access. You'd be expecting field2 to be located at offset 2 bytes into the
structure, i.e. address 0x1002, but that address is not evenly divisible
by 4 (remember, we're reading a 4 byte value here).

Fortunately, the compiler understands the alignment constraints, so in the
above case it would insert 2 bytes of padding inbetween field1 and field2.
Therefore, for standard structure types you can always rely on the compiler
to pad structures so that accesses to fields are suitably aligned (assuming
you do not cast the field to a type of different length).

Similarly, you can also rely on the compiler to align variables and function
parameters to a naturally aligned scheme, based on the size of the type of
the variable.

Sidenote: in the above example, you may wish to reorder the fields in the
above structure so that the overall structure uses less memory. For example,
moving field3 to sit inbetween field1 and field2 (where the padding is
inserted) would shrink the overall structure by 1 byte:

struct foo {
u16 field1;
u8 field3;
u32 field2;
};

Sidenote: it should be obvious by now, but in case it is not, accessing a
single byte (u8 or char) can never cause an unaligned access, because all
memory addresses are evenly divisible by 1.


Code that causes unaligned access
=

With the above in mind, let's move onto a real life example of a function
that can cause an unaligned memory access. The following function adapted
from include/linux/etherdevice.h is an optimized routine to compare two
ethernet MAC addresses for equality.

unsigned int compare_ether_addr(const u8 *addr1, const u8 *addr2)
{
const u16 *a = (const u16 *) addr1;
const u16 *b = 

Re: [RFC] Documentation about unaligned memory access

2007-11-22 Thread Alan Cox
 Most architectures are unable to perform unaligned memory accesses. Any
 unaligned access causes a processor exception.

Not all. Some simply produce the wrong answer - thats oh so much more
exciting.

 You may be wondering why you have never seen these problems on your own
 architecture. Some architectures (such as i386 and x86_64) do not have this
 limitation, but nevertheless it is important for you to write portable code
 that works everywhere.

Its usually faster if you don't misalign on x86 as well.

Alan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-22 Thread dean gaudet
On Fri, 23 Nov 2007, Alan Cox wrote:

 Its usually faster if you don't misalign on x86 as well.

i'm not sure if i agree with usually... but i know you (alan) are 
probably aware of the exact requirements of the hw.

for everyone else:

on intel x86 processors an access is unaligned only if it crosses a 
cacheline boundary (64 bytes).  otherwise it's aligned.  the penalty for 
crossing a cacheline boundary varies from ~12 cycles (core2) to many 
dozens of cycles (p4).

on AMD x86 pre-family 10h the boundary is 8 bytes, and on fam 10h it's 16 
bytes.  the penalty is a mere 3 cycles if an access crosses the specified 
boundary.

if you're making = 4 byte accesses i recommend not worrying about 
alignment on x86.  it's pretty hard to beat the hardware support.

i curse all the RISC and embedded processor designers who pretend 
unaligned accesses are something evil and to be avoided.  in case you're 
worried, MIPS patent 4,814,976 expired in december 2006 :)

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-22 Thread Avuton Olrich
On Nov 22, 2007 4:15 PM, Daniel Drake [EMAIL PROTECTED] wrote:
 Before I do so, any comments on the following?


 above case it would insert 2 bytes of padding inbetween field1 and field2.
 above case it would insert 2 bytes of padding in between field1 and field2.


 moving field3 to sit inbetween field1 and field2 (where the padding is
 moving field3 to sit in between field1 and field2 (where the padding is
-- 
avuton
--
 Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-22 Thread Kyle Moffett

On Nov 22, 2007, at 20:29:11, Alan Cox wrote:
Most architectures are unable to perform unaligned memory  
accesses. Any unaligned access causes a processor exception.


Not all. Some simply produce the wrong answer - thats oh so much  
more exciting.


As one example, the MicroBlaze soft-core processor family designed  
for use on Xilinx FPGAs will (by default) simply forcibly zero the  
lower bits of the unaligned address, such that the following code  
will fail mysteriously:


const char foo[] = { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07 };
printf(0x%08lx 0x%08lx 0x%08lx 0x%08lx\n,
*((u32 *)(foo+0)),
*((u32 *)(foo+1)),
*((u32 *)(foo+2)),
*((u32 *)(foo+3)));

Instead of outputting:
0x00010203 0x01020304 0x02030405 0x03040506

It will output:
0x00010203 0x00010203 0x00010203 0x00010203

Other embedded architectures have very similar problems.  Some may  
provide an unaligned data access exception, but offer insufficient  
information to repair the damage and resume execution.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-22 Thread Andi Kleen
Robert Hancock [EMAIL PROTECTED] writes:

 Also, x86 doesn't prohibit unaligned accesses,

That depends, e.g. for SSE2 they can be forbidden.

 but I believe they have
 a significant performance cost and are best avoided where possible.

On Opteron the typical cost of a misaligned access is a single cycle
and some possible penalty to load-store forwarding.

On Intel it is a bit worse, but not all that much. Unless you do 
a lot of accesses of it in a loop it's not really worth something
caring about too much.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-22 Thread Robert Hancock

Daniel Drake wrote:

Being spoilt by the luxuries of i386/x86_64 I've never really had a good
grasp on unaligned memory access problems on other architectures and decided
it was time to figure it out. As a result I've written this documentation
which I plan to submit for inclusion as
Documentation/unaligned_memory_access.txt

Before I do so, any comments on the following?


...


You may be wondering why you have never seen these problems on your own
architecture. Some architectures (such as i386 and x86_64) do not have this
limitation, but nevertheless it is important for you to write portable code
that works everywhere.


Also, x86 doesn't prohibit unaligned accesses, but I believe they have a 
significant performance cost and are best avoided where possible.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-22 Thread David Miller

Thanks you for working proactively on these problems.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/