Re: [sqlite] Semantic Database Design

Trevor Talbot Sun, 01 Jul 2007 16:55:10 -0700

On 7/1/07, Mov GP 0 <[EMAIL PROTECTED]> wrote:

2007/7/1, Trevor Talbot <[EMAIL PROTECTED]>:
>
> On 7/1/07, Mov GP 0 <[EMAIL PROTECTED]> wrote:
>
> > I'm searching for a database backend for a .NET project that involves a
> > semantic database backend. The problem is that I want to store any
> generic
> > kind of custom .NET datatype and not only SQL-primitives.
>
> >    CREATE TABLE @TableName
> >    (
> >       Object INT,
> >       Data @Datatype
> >    )
> >
> > The @Datatype Parameter is a special object that implements something
> like a
> > "ISQLiteDatatype" Interface that provides information about the class
> that
> > is needed for getting information about the used datatype. This
> includes:
> >
> >    -   how to serialize (use also ISerializeable)
> >    -   how to sort (use also ICompareable)
> >    -   is it possible to do byte ordering with the serialized data?
> >    -   minimal and maximal Datasize and -lenght?
> >    -   etc.
> >
> > SQL is not capable to handle this issue, but I think it should be
> possible
> > using the API. At least in MS SQL Server it is possible, but you can't
> > create new datatypes on the fly, but you have to upload and register the
> > datatype's class to the server as DLL.

> I'm not sure I understand what you're looking for here.  The SQL
> interface (for any DBMS) is not going to act like a .NET object; that
> behavior is contained within your application.  What exactly are you
> trying to gain with custom data types within the DB?  Are you trying
> to standardize an interface for other applications to use when talking
> to the same DB?

The implementation of this is also possible with C++. I just see it from the
.NET perspective because I'm a .NET developer. Currently I have no knowledge
about how SQLite is working, because I've never used it. I'm currently
looking at serval possible database candidates including also Oracle and MS
SQL Server and want to learn about the strenght and weaknesses of each.


It's the same with any language.  An RDBMS is relational storage;
individual pieces of data are not intelligent.  Object behavior is an
application thing, not a database thing.  A typical ORM
(object-relational mapper) treats a class as a table and vice versa.

> SQLite is an embedded database engine, so when you need to make
> customizations like this, nobody other than your application is going
> to be using the DB.  Since you're not modifying fundamental DBMS
> functionality, you don't need to do anything to SQLite; just build a
> wrapper in your application that uses BLOBs to store everything, and
> build the rest of your application on top of that wrapper.
>
> The same approach will work with pretty much any other DBMS as well.

What I want is more efficient sorting and searching. Therefore I want tables
that are optimized for sorting, searching, and storing a specific datatype.
Using only BLOBS is a possible solution. But searching ie. a number within
unsorted and variable size (from 1Bi to 2GiBi) BLOB table is not as
efficient than searching in a specialized fixed-size-integer table.


I don't think we're on the same page here.  You originally asked to
store arbitrary data.  If you need to store something that doesn't map
to any other primitive, then you store it as a BLOB (or closest
reasonable equivalent for the specific DBMS).  What other option is
there?

As example you can think about sorting geographical coordinates. In an
classical RDBMS this would look like:

   CREATE TABLE Pos
   (
      Pos_ID INT,
      lat FLOAT,
      long FLOAT
   )


Okay.

It is very easy and effective to sort this table for latitude and longitude.
You can also create such a table automatically using XML Schema. But when
you have an OWL ontology you can't create the table automatically anymore.
You would have to write specialized proxy classes that are using SPARQL
queries against the data to result in tables. The queries would be
specialized on specific schemes.

Now we come to the possibility to store this as BLOB. Because RDF can use
any datatype you will have to store everything from booleans to images,
music and videos. You can't make any good algorithms for sorting and
searching that mess. Searching for a specific number would require to search
the whole database in a serial manner and thus resuling in a O(N)
searchtime, where N is the size of the whole database. The really advantage
of using BLOBs is the speed while inserting data into the table, because
there is no lookup needed to determine where to store the data.

So my idea is to partition the data into separate tables. This will also
result in a float table. Because you can sort this table you can also make a
book search. Therefore you will result in a searchtime is about O(log(N)),
while N is the number of integers rather than the size of the whole
database.


You've lost me in terminology.  I'll leave this to someone who knows
what you're talking about.

You can realize such a thing by creating a table for each primitive datatype
by hand and serialize the objects into this elementar datatypes and the
semantics of the object. But because RDF has a very extensible Datatype
system a approach that handles also user defined datatypes would boost the
performance of the serialisation and unserialisation process.


At this point it sounds like you're talking about application
concerns.  What does this have to do with the DBMS?

Specifically, serialization is something the application has to deal
with regardless, since you have to transport it to the DBMS in a form
it understands anyway.  For instance, suppose you want to insert a 64
bit integer, that you already have in machine-native form.  If you
construct a straight SQL query to do that, you've just serialized it
to ASCII for transport to the DB.  If you use a DBMS-specific
parametric binding API that accepts a native machine integer, then for
a portable DB (either client/server style over a network, or
portable-file style like SQLite) it has to be serialized into the DB's
preferred form, such as MSB-first with a length attribute.

For an object you have to go through that same process.  Even with a
"native format" API, the result you end up with is equivalent to a
BLOB, and there's no performance gain from treating it as anything
else.  Trying to make the DB aware of it can make things worse, since
you end up with another neutral serialization in an effort to make it
application-agnostic.

When you get to an embedded database like SQLite, there's not even any
performance tradeoff to be had: serialization costs the same no matter
which set of code does it.

The reason some DBMS support custom data types is for situations when
the database needs to know some things about the data in order to
guarantee consistency when dealing with multiple applications, or for
particular kinds of searches.  For instance, IP addresses can be
stored as plain integers, but implementing a custom datatype would let
you teach the DBMS about CIDR prefix matching and searching.

FWIW, SQLite does not support custom datatypes at all.  It understands
a handful of primitives and nothing else.

Further speedup would be possible to partition the tables of the primitive
types even further. Ie. there might be a partition that holds ie. floats for
latitute and another partition for holding floats for longitude. This
approach might reduce the searching time almost to the time a search in a
specialised relational table takes.

In the example of coordinates the RDBMS will be still faster because you get
longitude and latitude with only one search rather than having the need of
making multiple searches. As example I will use the search for points that
located in a specific area.

In the RDBMS you will have to (1) simply access the points table (2)
[presort for longitude and] search all points where the longitude is between
two values and save the result in a temporary table and (3) [presort the
latitude and] search all of the resulting points having its latitude in a
given range.


Here it almost sounds like you've gone overboard on theory, and left
practical concerns behind somewhere.  I'm not quite sure what your
search example was meant to represent, exactly.  If you mean what an
application does, then it's misusing the RDBMS.  If you meant what the
RDBMS does internally, it's also wrong for any modern system.

You use the DBMS by giving it a query that says what you want: both
latitude and longitude between two values.  When the DBMS knows you
like to perform these kinds of searches (usually because you created a
compound index), what it does may be logically equivalent to but is a
little more efficient than creating a temporary table.

(As a side note, if coordinate handling is an actual need of yours,
look for info on GIS-friendly systems.  Multidimensional searching has
specialized tools.)

The question is if I can use SQLite for realizing such a database layout.


I don't think we're talking about the same things yet, so I don't know!

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Re: [sqlite] Semantic Database Design

Reply via email to