Gussimulator wrote:
> Hello there,
>
> This is what I mean by repetitive data:
>
> Tables:
> E:\DirectX90c\
> E:\DirectX90c\Feb2006_MDX1_x86_Archive.cab\
> E:\DirectX90c\Feb2006_d3dx9_29_x64.cab\
> E:\DirectX90c\Feb2006_xact_x64.cab\
> E:\DirectX90c\Feb2006_MDX1_x86.cab\
> E:\DirectX90c\Feb2006_xact_x86.cab\
>
> And so on, As you can see, the string E:\DirectX90c\ repeats all the
> time in this example. (Also does "Feb2006_" on almost every table).
>
> It's just an example of the type of repetitive data I have to deal
> with, they are normally paths. Since theres directories within
> directories, the paths repeat.
>
> What would be an ideal aproach for this situation?, I would like to
> save space, but I wouldnt like to waste a big amount of processing
> power to do so.
>
> One must keep in mind that my system must perform "well" on various
> situations (which I cant predict, at least not all of them), for this
> reason I cant have a very elaborated database scheme. Sometimes saving
> a few KBs could mean wasting a few tons of cycles, and I can't deal
> with that. I'd rather have those extra KBs and deal with a responsive
> application, than saving a few KBs and falling asleep at the keyboard
> (don't worry, it's a multi-threaded environment, however it's
> important to keep it optimized, I'm just over-sizing the problem a
> little).
>
>
> I'd like to take the right 'path' here...
> Thanks.
>
>
>
>
>
>
> ----- Original Message ----- From: "Darren Duncan"
> <[EMAIL PROTECTED]>
> To: <sqlite-users@sqlite.org>
> Sent: Thursday, July 06, 2006 12:04 AM
> Subject: Re: [sqlite] Compressing the DBs?
>
>
>> At 6:04 PM -0300 7/5/06, Gussimulator wrote:
>>> Now, since theres a lot of repetitive data, I thought that
>>> compressing the database would be a good idea, since, we all know..
>>> One of the first principles of data compression is getting rid of
>>> repetitive data, so... I was wondering if this is possible with
>>> SQLite or it would be quite a pain to implement a compression scheme
>>> by myself?.. I have worked with many compression libraries before so
>>> that wouldnt be an issue, the issue however, would be to implement
>>> any of the libraries into SQLite...
>>
>> First things first, what do you mean by "repetitive"?
>>
>> Do you mean that there are many copies of the same data?
>>
>> Perhaps a better approach is to normalize the database and just store
>> single copies of things.
>>
>> If you have tables with duplicate rows, then add a 'quantity' column
>> and reduce to one copy of the actual data.
>>
>> If some columns are unique and some are repeated, perhaps try
>> splitting the tables into more tables that are related.
>>
>> This, really, is what you should be doing first, and may very well be
>> the only step you need.
>>
>> If you can't do that, then please explain in what way the data is
>> repetitive?
>>
>> -- Darren Duncan 
>
>
We came across this with our filesystem metainfo system, what we ended
up doing was creating a sub-table called "location". with this location
it can be used either via join in the sql statement, or via a cached
internal structure to recreate the path of a file. (Easy enough with a
map<> or a hash<> style bucket class). No major overhead costs, but a
definate savings in space. (Each directory becomes an entry in the
location table, so a file is then stored as a location key value +
filename).

-- 
Bill King, Software Engineer
Trolltech, Brisbane Technology Park
26 Brandl St, Eight Mile Plains, 
QLD, Australia, 4113
Tel + 61 7 3219 9906 (x137)
Fax + 61 7 3219 9938
mobile: 0423 532 733

Reply via email to