On Wed, Jun 14, 2006 at 01:25:22 +1000, Erik de Castro Lopo wrote:

> The binary in question is a complete file system. As such it is 

Didn't James say it was a compressed file system?  If so, it's
simply a stream of bytes, not a mixture of different data types.
The compression algorithm doesn't know the meaning or size of each
bit of data in the filesystem, it just treats it as a stream of
bytes.  It might read those bytes as 32-bit words, or it might
read them as individual bytes, but it doesn't matter.  It just
processes what it sees as a chunk of random unstructured data.

Once it's uncompressed, nothing changes.  The kernel, when reading
a file system, doesn't read individual 8, 16 and 32 bit fields off
the disc, it reads a chunk of data (probably some number of sectors)
then tries to make sense of it.  By then, it's been copied at least
once (maybe via dma) by code which has no knowledge of the underlying
structure of the data it's processing.

The different fields don't have meaning until they're interpretted
by the application using them.

Now, I don't know whether byte swapping is needed before or after
uncompressing, both, or neither, but I don't believe any knowledge
of the underlying structure of the data is necessary to do that
byte swapping.

Let me give you an example which may help you understand why I think
this.

I write processor models (x86 little endian host) for a living, and
have worked on both big and little endian cores, including some that
can switch endianness at run time.  When we load a target binary, all
we know is that it's an ELF (or S record, or whatever) image and that
it's big or little endian, but not the meaning of each part of the
image.  For our ARM models, and most other big-endian cores, we simply
byte swap big-endian data as aligned 32-bit words, and store it in
little endian format in host memory.  We store the data in host endian
format and do the conversion when it's read depending upon the current
processor mode.  Unaligned accesses complicate the process[0], but are
irrelevant for the purpose of this example.

When we do the initial byte swapping, we neither know nor care what
the meaning or size of each individual location in the binary is, we
simply treat it as a collection of 32-bit numbers and everything just
works.

I think this file system image can be treated the same way,  at least
until you actually want to mount it and interpret the contents.  Then,
and only then, do you need to know how big each field is.


Cheers,

John

[0] Some cores raise an exception, some do an aligned access and rotate
    the data, some ignore the least signifcant address bits to force
    alignment and some do multiple bus cyles to handle the unaligned 
    access.

-- 
> Hmph, whatever happened to *ethics*? - That's what I'd like to know!
It's still there, just to the north of Kent.  Nasty elisp you've
got there....
            -- Sean Purdy
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Reply via email to