pps wrote:
John Stanton wrote:

If you really want a data structure which is independent of processor architecture and compilers why not byte encode the numbers into what would be described in Pascal as a Packed Array of Char. Then byte ordering, word alignment etc are irrelevant.

It does require that the client have a function which transforms and recovers the numbers to the particular format needed by the client processor/compiler combination but has the advantage that the data may be shared by Big Endian/Little Endian/RISC and CISC machines.

The extreme case would be to store the numbers as ASCII decimal strings, rather inefficient but a common way of storing numeric data when it must be broadly accessable.
JS



You are describing here the thing that's called serialization. See one of my previous post in this thread about that. My favorite lib for serialization is boost::serialization allows serialization of c++ objects into binary, text(ascii) or xml strings and more...
for example,
struct intDouble {
  int x;
  int y;
};

would be serialized in xml this way:

<intDouble class_id="0" tracking_level="0" version="0">
    <x>1</x>
    <y>2</y>
</intDouble>

http://www.google.ca/search?q=boost+serialization

I was describing no such thing.  An example of what I was describing is

union stored_word
  struct
#ifdef BIG_END
    unsigned char b0;
    unsigned char b1;
    unsigned char b2;
    unsigned char b3;
#else
    unsigned char b3;
    unsigned char b2;
    unsigned char b1;
    unsigned char b0;
  } a;
#endif
  int b;
}

struct stored_doublet {
  union stored_word x;
  union stored_word y;
}

Provided the compiler packs the unsigned chars into the free union this structure stores the (x, y) point independent of byte order and with no bloat. The only overhead is the extra move when storing and recovering the data. Note that this structure cannot be guaranteed to work for all architectures and compilers, and some special coding might be needed for quirky clients.

From an Sqlite perspective the data would be stored as a BLOB since it is still binary format. It could also be stored as a 64 bit integer. If it were to be stored as TEXT the transformation could be more involved and more expensive by using BASE64 or ASCII hex format. The added cost would be insignificant compared to the prospect of doing a radix transformation into ASCII decimal and embedding that in XML then havong to parse it out on every access.

In practical terms it would be best to choose the byte format used by Intel processors since that would most likely minimize re-ordering.

BTW, we used just that principle over the years to produce structures for data and B-Tree indices which could be accessed independent of the host processor. Databases could be downloaded regardless of the destination machine, a blessing for users.
JS

Reply via email to