At 07:07 06/07/2006, you wrote:
Hello there,
This is what I mean by repetitive data:
Tables:
E:\DirectX90c\
E:\DirectX90c\Feb2006_MDX1_x86_Archive.cab\
E:\DirectX90c\Feb2006_d3dx9_29_x64.cab\
E:\DirectX90c\Feb2006_xact_x64.cab\
E:\DirectX90c\Feb2006_MDX1_x86.cab\
E:\DirectX90c\Feb2006_xact_x86.cab\
And so on, As you can see, the string E:\DirectX90c\ repeats all the
time in this example. (Also does "Feb2006_" on almost every table).
It's just an example of the type of repetitive data I have to deal
with, they are normally paths. Since theres directories within
directories, the paths repeat.
What would be an ideal aproach for this situation?, I would like to
save space, but I wouldnt like to waste a big amount of processing
power to do so.
One must keep in mind that my system must perform "well" on various
situations (which I cant predict, at least not all of them), for
this reason I cant have a very elaborated database scheme. Sometimes
saving a few KBs could mean wasting a few tons of cycles, and I
can't deal with that. I'd rather have those extra KBs and deal with
a responsive application, than saving a few KBs and falling asleep
at the keyboard (don't worry, it's a multi-threaded environment,
however it's important to keep it optimized, I'm just over-sizing
the problem a little).
I'd like to take the right 'path' here...
Thanks.
.... SQLite has no compression system for free. Also, any compression
must be done on page level, not data level, because most compression
algorithms uses past data statics (statistical data of past data) for
compression and if you try to it on data level, when a row is
eliminated all rows after it becomes garbage.
There are a lot of compression algorithms, but i think the best for
this is an arith or range coder with order 0 or 1, the page size is
too low for greater orders or lz algorithms. Both (arith and range)
are pretty fast, no fpu code (only integer, for embedded devices) and
i think it will not slow too much file i/o. On text only data you
can expect 2.5 bpb or near 65% of size reduction, more when page size
is greater.
The code is about 10 Kb, but i don't know where "plug-it-in" ;)
HTH