Michael, Your first-draft design is going to be easier to manage and more scalable. Stick with that, rather than the curious notion of having one table per keyword. Speaking as a frequent maintenance programmer, that's the kind of design that would force me to hunt somebody down. It probably will give very good performance, but it's unlikely that your current list of keywords will remain the final list of keywords.
The real solution of course it to try it both ways and see which is faster. My suspicion is that you won't find a gigantic performance improvement by having a separate table for each keyword. You will have a giant maintenance headache though. The money you spend on programming time could have been more profitably invested in better hardware. Even a dirt cheap programmer costs about $500 for a day of work once you factor in taxes and minimal benefits. A week of that and you've paid for a decent server. A month of extra programming over the life of the project and you've paid for a very good server, with fast processor and disks that could have netted you the improved performance without the extra coding. It's a cold way of looking at it, but I've had to deal with too many designs that saved a little time and gained a little speed up front, but that cost a lot of money down the road in longer development times. Clay Dowling Michael Sizaki wrote: > Hi, > > I'm thinking about using a non standard design for my database. > There are up to 1 million records in the database. Each record > has an integer UID. There is another table with keywords for > each of the records. Records typically have 0-15 keywords associated. > The number of keywords is small 100-1000. Currently I have > > CREATE TABLE data ( > dataId INTEGER PRIMARY KEY AUTOINCREMENT, > ... > ); > CREATE TABLE words ( > wordId INTEGER PRIMARY KEY AUTOINCREMENT, > word TEXT PRIMARY KEY ON CONFLICT IGNORE > ); > > CREATE TABLE keywords ( > wordId INTEGER, > dataId INTEGER, > PRIMARY KEY (wordId, dataId) ON CONFLICT IGNORE > ); > CREATE INDEX keywordsDataIdIndex ON keywords (dataId); > > This creates in 2 big indexes for keywords... > > When I display data items, I also display all associated keywords. > Therefore I created a cat function, that creates a comma separated > list of items: > > SELECT *,(SELECT cat(category) FROM keywords WHERE keywords.dataId = > data.dataId) FROM data ... > > I just wonder, if I should not forget about database normalization > and create one table per keyword: > > CREATE TABLE data ( > dataId INTEGER PRIMARY KEY AUTOINCREMENT, > keywords TEXT, -- a comma separated list of word ids > ... > ); > CREATE TABLE words ( > wordId INTEGER PRIMARY KEY AUTOINCREMENT, > word TEXT PRIMARY KEY ON CONFLICT IGNORE > ); > > -- for each word ID there's a table > CREATE TABLE keywords_0 ( > dataId INTEGER PRIMARY KEY, > ); > > Is there a limit in the number of tables? > > If I choose keyword 1, 7 and 42 I would > > SELECT * FROM keywords_1 UNION SELECT * FROM keywords_7' UNION SELECT * > FROM keywords_42' > > I expect the database to become significantly smaller... > > Has anybody tried something like this? > > Michael

