Tito Ciuro wrote:


Due to application requirements, I must rely on LIKE and GLOB in order to match data, matching data that contains some value (sensitive or insensitive match).

Now, it seems to me that using either LIKE or GLOB will force a row scan anyhow, since it can't use the index, correct? So your solution would force me to perform the row scan for 250K rows when matching values. That is 10 times more data to scan through.


Tito,

I'm glad to see you are looking at this.

First, the attribute value table is storing the same string data in my proposal and your value table. In my case there is some overhead for the additional rowids (3 bytes for 250K rows), and the FileId column (2 bytes for 25K files), but in your case you have the overhead of the field separator commas (13 bytes in your example with 14 attributes). My table has 250K short rows, and yours has 25K long rows. There are 10 times as many rows in my table, but your rows are about 10 times a large. There may be a small difference in the total amount of data, but it is not going to be significant one way or the other.

The real heart of the problem is your requirement to match data using LIKE due to "application requirements". Can you expand on that statement so that I can understand the details of your requirements?

From your examples, what you are doing is using LIKE to do equality tests for a subfield in your large value column. With the values separated into separate rows there is no need to use LIKE to do that.

  WHERE CMValues LIKE '%2004-12-16 10:11:35  -0800%'

would become

   WHERE Attribute.Value =  '2004-12-16 10:11:35  -0800'

If you need case insensitive searches you simply convert the value string to the same case as your test string in the equality test. Or if you are unsure of the case of the test string, you can convert them both.

  WHERE upper(Attribute.Value) =  'ABIWORD.PROFILE'

or

  WHERE upper(Attribute.Value) =  upper(:match_value)

Now, these comparisons will scan all the values, since they don't specify which attribute you want to check. This may be what you want if you are searching for a date and you don't care if it matches the create date or the modification date, but generally that is not what you want to do (Why look at file size or album name data when looking for a date?). Usually you will want to check the value of a particular attribute as shown below.

   WHERE AttributeDefinition.Name =  ' Modification  Date'
    AND Attribute.Value = :mod_date

This query will use the AttributeValue index to reduce the rows scanned to only those that hold values of this attribute. In this case each file has a mod date so it will search 25K rows. For a query that is looking for a attribute like 'Album Name' which only exists for your 1K music files, it will only scan those 1K rows checking for album name matches. In some cases you may want to look in a set of attributes, which you can do as shown below.

WHERE AttributeDefinition.Name IN ( ' Modification Date', 'Create Date', 'Backup Date')
    AND Attribute.Value = :date

The possibilities are, quite literally, endless. If you have some sample queries that you need to make please tell me what they are so we can talk about more concrete examples.

HTH
Dennis Cote


Reply via email to