On 23/1/2014 7:12 μμ, Drake Wilson wrote:
Quoth Eleytherios Stamatogiannakis <est...@gmail.com>, on 2014-01-23 14:37:23 
+0200:
Let me describe a use case where a not unique key and without rowid
are most welcome. We have a distributed big data system here which
uses SQLite for the partitions. To be able to efficiently execute
join queries on splited partitions, we need to assemble the
partitions of one side of the query to create an index on them.

Do you really need bag rather than set semantics?  That is, can there
be a case where rows that are identical in _all_ columns need to be
treated as separate and (e.g.) have both copies show up in queries?


As we need to emulate (non table backed) indexes, yes. In an index you can have the same key with multiple "covering values" accompanying it. Consider the case where you want a covering index that "covers" the whole table. And you know that you'll only ever hit the index (e.g. for joins), and not the table that backs the index. In that case, the only way to store the data only once is using something like what i've described in my previous email.

Also using the whole row as a primary key isn't a viable solution. There are many kinds of data that may have duplicate rows in the index. Like pre-graph data (co-occurency lists), on which, for example, you want to calculate the frequency of the links before you group by them.

Our data is mainly scientific data (from digital libraries). In which we do article text mining (finding citations, funders, classification, protein references, ...). We also deal with graph data (graph isomorphisms, graph mining...).

All of the above processes are done using madIS [*], which is essentially SQLite + extensions (multivalued row and aggregate functions, virtual table composition, ...).

l.

[*] http://madis.googlecode.com

Most of the time, the way data is represented in relational databases,
this winds up requiring an arbitrary identity key anyway to be
practical (so one can manipulate a specific instance of an otherwise
identical row), or else it's equivalent to adding a count column to
turn {(x, y, z), (x, y, z)} into {(x, y, z, 2)}, though the latter has
a similar slight complexity hitch in the merge case to what you were
doing.

If you do require the above, I'm curious what data is being handled
here, since it's a rare case (but I understand if you don't wish to
say).  If not, then you may actually have a primary key of the whole
row, in which case I'm not sure why inventing a rowid is needed.

    ---> Drake Wilson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to