Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?

Elefterios Stamatogiannakis Thu, 23 Jan 2014 14:26:10 -0800

On 23/1/2014 7:12 μμ, Drake Wilson wrote:

Quoth Eleytherios Stamatogiannakis <est...@gmail.com>, on 2014-01-23 14:37:23 
+0200:

Let me describe a use case where a not unique key and without rowid
are most welcome. We have a distributed big data system here which
uses SQLite for the partitions. To be able to efficiently execute
join queries on splited partitions, we need to assemble the
partitions of one side of the query to create an index on them.


Do you really need bag rather than set semantics?  That is, can there
be a case where rows that are identical in _all_ columns need to be
treated as separate and (e.g.) have both copies show up in queries?

As we need to emulate (non table backed) indexes, yes. In an index youcan have the same key with multiple "covering values" accompanying it.Consider the case where you want a covering index that "covers" thewhole table. And you know that you'll only ever hit the index (e.g. forjoins), and not the table that backs the index. In that case, the onlyway to store the data only once is using something like what i'vedescribed in my previous email.

Also using the whole row as a primary key isn't a viable solution. Thereare many kinds of data that may have duplicate rows in the index. Likepre-graph data (co-occurency lists), on which, for example, you want tocalculate the frequency of the links before you group by them.

Our data is mainly scientific data (from digital libraries). In which wedo article text mining (finding citations, funders, classification,protein references, ...). We also deal with graph data (graphisomorphisms, graph mining...).

All of the above processes are done using madIS [*], which isessentially SQLite + extensions (multivalued row and aggregatefunctions, virtual table composition, ...).


l.

[*] http://madis.googlecode.com

Most of the time, the way data is represented in relational databases,
this winds up requiring an arbitrary identity key anyway to be
practical (so one can manipulate a specific instance of an otherwise
identical row), or else it's equivalent to adding a count column to
turn {(x, y, z), (x, y, z)} into {(x, y, z, 2)}, though the latter has
a similar slight complexity hitch in the merge case to what you were
doing.

If you do require the above, I'm curious what data is being handled
here, since it's a rare case (but I understand if you don't wish to
say).  If not, then you may actually have a primary key of the whole
row, in which case I'm not sure why inventing a rowid is needed.

    ---> Drake Wilson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?

Reply via email to