Re: [sqlite] [SOLVED] Re: [sqlite] LIKE and GLOB bug with numbers?

Dennis Cote Tue, 28 Mar 2006 14:24:31 -0800

Tito Ciuro wrote:

I have no idea why you're so angry. Anyway, there are so many thingsI can think of saying, I'll just make it brief and to the point.
This thing won't scale. I'd like to see it when you have the 4.5million records my database contains,
and that is still tiny for all intents-and-purposes.
For the type of application I'm building, it doesn't need to scale.At most I'll have 25K records. Even with the 100K database I've usedfor testing, it simply flies.
I'm not building anything for the outside world. Just something thatserves me well.
I'm storing variable-length data, with a very different set ofattributes. Some may have 1 attribute, others tens of them. Perhapseven a hundred. Using a column per attribute is not a good idea. Afew days ago I asked this question and Dr. Hipp mentioned:
The more tables you have, the slower the first query will run
and the more memory SQLite will use.  For long-running applications
where the startup time is not a significant factor, 100s or
1000s of tables is fine.  For a CGI script that starts itself
up anew several times per second, then you should try to keep
the number of tables below a 100, I think.  Less than that if
you can. You should also try and keep down the number of tables
in low-memory embedded applications, in order to save on memory
usages.  Each table takes a few hundred bytes of memory - depending
on the number of columns and features.
Having two columns (one for the key and the other one for the dataitself) seems like a good balance between speed and ease of use. Idon't care if it doesn't scale because the intended deployment is 25Kat most, as I said earlier. Even with data sets 4x that size works fine.
There is absolutely no reason this could not be properly designedand STILL fit any possible need for that ugly packed record.

Tito,

While the tone of MGC's post may have been a little over the top, manyof his points were valid.

All he was really suggesting is that a set of normalized tables would dothe same thing, and probably preform better, especially when scaled tolarger tables.

In your application you have 25K files each with 1 to 100 attributes.You can still store your data in two normalized tables.

The main idea is to use more than one row in the value table for eachfile. These values are all tied to the file name by a common file idvalue. This table schema is shown below.


   -- basic two table attribute setup

   create table File (
       Id          integer primary key;

Name text; -- name of fileincluding path

   );

   create table Attribute (
       FileId      integer references File(Id); -- foreign key
       Name        text;                       -- name of attribute

Value text; -- value of attributeprimary key (FileId, Name););


   create index AttributeValue on Attribute(Name, Value);

With these tables you will have 25K rows in the File table, one perfile, and 250K rows in the Attribute table assuming an average of 10attributes per file (your example had 14).

An index on the attribute name and value pairs will greatly speeduplookups of a particular attribute.

There is still some redundancy in this set of tables since the names ofthe attributes are being repeated over and over in the Attribute table.This may not be a big problem, but you really only have 100 attributes,so there is no sense storing 25K copies of the name of a commonattribute that applies to all files. This information can be factoredout into a third table as shown below.



   -- Factor out the attribute definitions into a separate table

   create table File (
       Id          integer primary key;

Name text; -- name of fileincluding path

   );

   create table AttributeDefinition (
       Id          integer primary key;

Name text unique; -- name of theattributeType text; -- type of the attribute

   );

   create table Attribute (
       FileId      integer references File(Id);
       AttId       integer references AttributeDefinition(Id);

Value text; -- value of attributeprimary key (FileId, AttId););


   create index AttributeValue on Attribute(AttId, Value);

This will give you three tables (which will not cause a noticeabledifference in the database open time). One with 25K rows of file names.One with 100 rows of attribute definition data, which includes theattribute name. And one with 250K rows of attribute value data.

The third table assigns each attribute an id number which will fit in asingle byte, and that id is used to replace the 25K copies of theattribute name. This will be a sizable reduction in stored data for allbut the shortest names. It also allows other definition data to bestored with each attribute name, such as the type column in the table above.

There are also several indexes which will speed up queries and enforceuniqueness constraints on the data.

Now you can quickly locate all the attributes of a file or get all thefiles that have a value of a certain attribute that fall within a rangeof values using SQL to join the tables.


   -- get all attributes of a file
   select AttributeDefinition.Name, Attribute.Value
   from File
   join Attribute on Attribute.FileId = File.Id
   join AttributeDefinition on Attribute.AttId = AttributeDefinition.Id
   where File.Name = :file;

   -- get all files with an attribute in a range of values
   select File.Name
   from AttributeDefinition
   join Attribute on Attribute.AttId = AttributeDefinition.Id
   join File on Attribute.FileId = File.Id
   where AttributeDefinition.Name = :attribute
   and Attribute.Value between :min and :max;

You now have the full power of SQL available to extract information fromthe data. You can find the min and max values of a certain attribute,for example the modification date. You can find all the files with aparticular value of an attribute, such as a permission settings. You cancount the files that have one attribute but not another, etc.


If you give this a try, you might be surprised at how flexible it is.

Just because something works doesn't mean that it can't be improved.

Dennis Cote

Re: [sqlite] [SOLVED] Re: [sqlite] LIKE and GLOB bug with numbers?

Reply via email to