Re: [PATCH] Reduce COW sections data by marking data constant

Diego 'Flameeyes' Pettenò Thu, 31 Jan 2008 17:28:37 -0800

On Friday 01 February 2008, you wrote:
> Yes; but I still have to manually increment the size to an appropriate
> value each time I add a bigger string; plus I'm wasting space for all
> shorter strings (which applies more to bigger-than-two-element char
> arrays);


Actually, you're not always. The actual space occupied by a pointer to 
character is

sizeof(char*) + sizeof("literal");

This means that for 32-bit (best-case)

char *foo = "a" -> 4 + 2 = 6 bytes
char *foo = "ab" -> 4 + 3 = 9 bytes
char *foo = "abc" -> 4 + 4 = 8 bytes

char foo[] = "a" -> 2 bytes
char foo[] = "ab" -> 3 bytes
char foo[] = "abc" -> 4 bytes

By that reasoning, char foo[6] will _always_ take less or equal space than 
char *foo.
But 32-bit architectures are also dying beside embedded systems, which means 
that the most common case especially in the years to come will be 
sizeof(char*) == 8 (64-bit architectures), at which point char foo[10] will 
always be smaller or equal to char *foo.

Considering you get out with an indirection less, most of the time even 
wasting 8 more bytes per string is worth the change.

> The first solution makes good sense to me, particularly for its improved
> correctness; but fixing the second problem seems like a
> nano-optimization that doesn't bring sufficient efficiency benefits to
> outweigh its drawbacks in maintainability.

It is a micro-optimisation, I admit that, but it's not just the indirection 
the problem.

Pointers, and structures containing pointers, need to be runtime-relocated for 
shared libraries and PIC code (let's assume that shared libraries are always 
PIC, for the sake of argument). In these cases the data is written 
to .data.rel, or .data.rel.ro for costants (I'm talking ELF here, other 
output formats most likely have something like that), which are copy-on-write 
sections. Without prelinking, these sections will be copied right at the 
start of the program, and when copied, they are no more shared between 
processes. Which means they'll be using pages of private memory on the 
various processes.

It is a minor thing for wget as it's not a shared library, but a 
fire-and-forget program, so even for security people using PIE it's far from 
being a big hit. It's more important for shared libraries though, especially 
because shared libraries can't always be prelinked properly. See also 
http://farragut.flameeyes.is-a-geek.org/articles/2008/01/01/reminding-a-weakness-of-prelink

Now, glib comes to the point of using the method that Mart described in the 
comments to my first blog ( 
http://farragut.flameeyes.is-a-geek.org/articles/2007/12/19/array-of-pointers-and-array-of-arrays
 ) 
and I quoted on the second ( 
http://farragut.flameeyes.is-a-geek.org/articles/2008/01/01/some-more-about-arrays-of-strings
 ) 
to avoid the increase-the-size and waste-the-space game, but making it a 
quite bigger, IMHO, maintenance problem. I haven't seen enough use yet for 
this to be a royally good idea, but I'm thinking of trying that out with 
xine-lib soon enough (without the second array, if the access can be done 
sequentially instead of randomly).

-- 
Diego "Flameeyes" Pettenò
http://farragut.flameeyes.is-a-geek.org/

signature.asc
Description: This is a digitally signed message part.

Re: [PATCH] Reduce COW sections data by marking data constant

Reply via email to