Gussimulator wrote: > Hello there, > > This is what I mean by repetitive data: > > Tables: > E:\DirectX90c\ > E:\DirectX90c\Feb2006_MDX1_x86_Archive.cab\ > E:\DirectX90c\Feb2006_d3dx9_29_x64.cab\ > E:\DirectX90c\Feb2006_xact_x64.cab\ > E:\DirectX90c\Feb2006_MDX1_x86.cab\ > E:\DirectX90c\Feb2006_xact_x86.cab\ > > And so on, As you can see, the string E:\DirectX90c\ repeats all the > time in this example. (Also does "Feb2006_" on almost every table). > > It's just an example of the type of repetitive data I have to deal > with, they are normally paths. Since theres directories within > directories, the paths repeat. > > What would be an ideal aproach for this situation?, I would like to > save space, but I wouldnt like to waste a big amount of processing > power to do so. > > One must keep in mind that my system must perform "well" on various > situations (which I cant predict, at least not all of them), for this > reason I cant have a very elaborated database scheme. Sometimes saving > a few KBs could mean wasting a few tons of cycles, and I can't deal > with that. I'd rather have those extra KBs and deal with a responsive > application, than saving a few KBs and falling asleep at the keyboard > (don't worry, it's a multi-threaded environment, however it's > important to keep it optimized, I'm just over-sizing the problem a > little). > > > I'd like to take the right 'path' here... > Thanks. > > > > > > > ----- Original Message ----- From: "Darren Duncan" > <[EMAIL PROTECTED]> > To: <sqlite-users@sqlite.org> > Sent: Thursday, July 06, 2006 12:04 AM > Subject: Re: [sqlite] Compressing the DBs? > > >> At 6:04 PM -0300 7/5/06, Gussimulator wrote: >>> Now, since theres a lot of repetitive data, I thought that >>> compressing the database would be a good idea, since, we all know.. >>> One of the first principles of data compression is getting rid of >>> repetitive data, so... I was wondering if this is possible with >>> SQLite or it would be quite a pain to implement a compression scheme >>> by myself?.. I have worked with many compression libraries before so >>> that wouldnt be an issue, the issue however, would be to implement >>> any of the libraries into SQLite... >> >> First things first, what do you mean by "repetitive"? >> >> Do you mean that there are many copies of the same data? >> >> Perhaps a better approach is to normalize the database and just store >> single copies of things. >> >> If you have tables with duplicate rows, then add a 'quantity' column >> and reduce to one copy of the actual data. >> >> If some columns are unique and some are repeated, perhaps try >> splitting the tables into more tables that are related. >> >> This, really, is what you should be doing first, and may very well be >> the only step you need. >> >> If you can't do that, then please explain in what way the data is >> repetitive? >> >> -- Darren Duncan > > We came across this with our filesystem metainfo system, what we ended up doing was creating a sub-table called "location". with this location it can be used either via join in the sql statement, or via a cached internal structure to recreate the path of a file. (Easy enough with a map<> or a hash<> style bucket class). No major overhead costs, but a definate savings in space. (Each directory becomes an entry in the location table, so a file is then stored as a location key value + filename).
-- Bill King, Software Engineer Trolltech, Brisbane Technology Park 26 Brandl St, Eight Mile Plains, QLD, Australia, 4113 Tel + 61 7 3219 9906 (x137) Fax + 61 7 3219 9938 mobile: 0423 532 733