Re: [sqlite] a system for arbitrarily tagging rows in a table

P Kishor Sun, 26 Jul 2009 21:23:30 -0700

On Sun, Jul 26, 2009 at 9:23 PM, Jay A. Kreibich<j...@kreibi.ch> wrote:
> On Sat, Jul 25, 2009 at 09:26:16AM -0700, Jim Showalter scratched on the wall:
>> You can have the tags in a separate table that has a foreign-key to
>> the table with the rows in in you want to tag,
>
>  That's essentially what the OP is doing, except they've built a
>  many-to-many relationship using a bridge table, rather than a
>  one-to-many using a simple foreign key.  This allows any tag to be
>  associated with any row without duplicating any of the tags, allowing
>  for better data normalization.


Indeed. My query is working quite well for now. I do realize that it
will not be efficient for very large datasets, but it is quite good
for my needs.

>
>> If you need to search for
>> multiple tags at a time, each requires a join:
>
>  Right, and that's the problem.  If you've got an application that
>  allows you to search for tags with an arbitrary number of tags, you
>  need a different query for each situation (one tag, two tags, etc.).
>  Also, adding two more JOINs (since you need to join through the
>  bridge table) for each additional search target means your performance
>  degrades very quickly when you've got four or five search tags.
>
>  This is why I suggested looking at Relational division.  Even if SQL
>  doesn't support it directly, there are other ways of doing it to make
>  the query more generic.  With division, you can do something like (in the
>  terms of the OP's post):
>
>  (foo JOIN foo_tag JOIN tag) / (search_tags) = foo with search_tags
>
>  This basically builds a table of all foo:tag combinations and then
>  divides out our search tags, resulting in a list of foo.f_id values
>  that have an association with all of the tags found in search_tags.
>  You can then JOIN the results of this expression back to the tags.

I gave relational division a gander. The text was a bit thick so my
eyes started glazing after a while, but it piqued my curiosity
sufficiently that I am going to go back to it and try and understand
it better.


>  The OP asked to have the search tags removed from the results, which
>  is also a bit of a trick to do in a generic way.

Yes, this is a "view" problem that is fairly easily solved in the
application layer. Essentially, one wants to see all the tags for the
result returned for a given tag... for example, for a given
nationality, "US," show me all the ethnicities, but, since I already
know that all the rows in the result are for nationality = "US," there
is no point in displaying "US" along with the rows. Much easier to do
this in the application logic.



>
>  Anyways... The relational division makes any query search -- from
>  1 to 1000 tags -- have the same structure.  As long as the division
>  performance is good, the query performance shouldn't horribly
>  degrade as you add more search tags.
>
>> select t
>> from tagged as t
>> join t.tags as tag1
>> join t.tags as tag2
>> where tag1 = some value
>>   and tag2 = some value.
>
>  This points in the direction of a one-to-many structure, only the
>  syntax is not making sense to me.  JOINs work on tables, not columns,
>  and WHERE expressions work on columns, not tables.
>
>  You're also only doing the first half of the problem, since you need
>  to join this back to the tags to get the result the OP was looking for
>  (including pulling back out the search targets).  It's a trickier
>  problem than it first looks, especially if you want to do it in one
>  SQL statement with a minimal number of sub-SELECTs.
>
>   -j
>
>> ----- Original Message -----
>> From: "Jay A. Kreibich" <j...@kreibi.ch>
>> To: <punk...@eidesis.org>; "General Discussion of SQLite Database"
>> <sqlite-users@sqlite.org>
>> Sent: Friday, July 24, 2009 9:21 PM
>> Subject: Re: [sqlite] a system for arbitrarily tagging rows in a table
>>
>>
>> > On Fri, Jul 24, 2009 at 09:20:29PM -0500, P Kishor scratched on the
>> > wall:
>> >> I am trying to develop a "tagging" system, whereby each row in a
>> >> table
>> >> can be tagged with arbitrary number of tags.
>> >
>> >  This smells of a Relational division problem.  If you're dealing
>> > with
>> >  tags you might want to have a look at that (Celko has a few good
>> >  articles on it).  Since SQL lacks a native Relational division
>> >  operator, chances are a solution in that direction is going to be
>> >  more complex -- at least for this problem.  But any time I've done
>> >  tags or attributes, sooner or later I find myself needing to do a
>> >  division.  They come in handy any time you say "my data is vertical
>> >  but I need it horizontal."  You might want to read up on them just
>> > to
>> >  have that knowledge available.
>> >
>> >
>> >> TABLE foo (f_id INTEGER PRIMARY KEY, f_name TEXT);
>> >> TABLE tag (t_id INTEGER PRIMARY KEY, t_name TEXT);
>> >> TABLE foo_tag (f_id INTEGER, t_id INTEGER);
>> >
>> >
>> >> I have the following solution. Could I do better or differently?
>> >
>> >  I'm not sure about "better", but here's different:
>> >
>> > sqlite> SELECT foo.f_name, tag.t_name
>> >   ...> FROM tag AS target
>> >   ...>   NATURAL JOIN foo_tag AS target_ft
>> >   ...>   NATURAL JOIN foo
>> >   ...>   NATURAL JOIN foo_tag
>> >   ...>   NATURAL JOIN tag
>> >   ...> WHERE target.t_name = 'bar'
>> >   ...>   AND tag.t_id != target.t_id
>> >   ...> ORDER BY foo.f_name, tag.t_name;
>> >
>> >  This basically folds your IN sub-select back into the main query.
>> >  We join "foo" to the tag table in two directions... one to find
>> >  the search target tag id and the other to produce the output.
>> >
>> >  "tag AS target" with the first WHERE clause should return one row.
>> >  We join that through "foo_tag AS target_ft" to get a list of foo
>> > ids
>> >  that have the search target tag.  We then build the normal output
>> >  list by joining that back through the foo_tag bridge table to the
>> >  tags, and throw out any rows with an output tag id that matches the
>> >  search target tag id.
>> >
>> >  Simple!
>> >
>> >  Best of all, the target only appears in the query once, and your
>> > name
>> >  convention means we can use NATURAL JOINs to keep things clean.  It
>> >  also totally falls apart if you need to search on more than one
>> > tag.
>> >  That's where Relational division comes in.
>> >
>> >   -j (who had to draw a picture to get it right)
>
> --
> Jay A. Kreibich < J A Y  @  K R E I B I.C H >
>
> "Our opponent is an alien starship packed with atomic bombs.  We have
>  a protractor."   "I'll go home and see if I can scrounge up a ruler
>  and a piece of string."  --from Anathem by Neal Stephenson
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Puneet Kishor http://www.punkish.org
Carbon Model http://carbonmodel.org
Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
Nelson Institute, UW-Madison http://www.nelson.wisc.edu
-----------------------------------------------------------------------
Assertions are politics; backing up assertions with evidence is science
=======================================================================
Sent from Madison, WI, United States
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] a system for arbitrarily tagging rows in a table

Reply via email to