On 7/1/07, Mov GP 0 <[EMAIL PROTECTED]> wrote:
2007/7/1, Trevor Talbot <[EMAIL PROTECTED]>: > > On 7/1/07, Mov GP 0 <[EMAIL PROTECTED]> wrote: > > > I'm searching for a database backend for a .NET project that involves a > > semantic database backend. The problem is that I want to store any > generic > > kind of custom .NET datatype and not only SQL-primitives. > > > CREATE TABLE @TableName > > ( > > Object INT, > > Data @Datatype > > ) > > > > The @Datatype Parameter is a special object that implements something > like a > > "ISQLiteDatatype" Interface that provides information about the class > that > > is needed for getting information about the used datatype. This > includes: > > > > - how to serialize (use also ISerializeable) > > - how to sort (use also ICompareable) > > - is it possible to do byte ordering with the serialized data? > > - minimal and maximal Datasize and -lenght? > > - etc. > > > > SQL is not capable to handle this issue, but I think it should be > possible > > using the API. At least in MS SQL Server it is possible, but you can't > > create new datatypes on the fly, but you have to upload and register the > > datatype's class to the server as DLL.
> I'm not sure I understand what you're looking for here. The SQL > interface (for any DBMS) is not going to act like a .NET object; that > behavior is contained within your application. What exactly are you > trying to gain with custom data types within the DB? Are you trying > to standardize an interface for other applications to use when talking > to the same DB?
The implementation of this is also possible with C++. I just see it from the .NET perspective because I'm a .NET developer. Currently I have no knowledge about how SQLite is working, because I've never used it. I'm currently looking at serval possible database candidates including also Oracle and MS SQL Server and want to learn about the strenght and weaknesses of each.
It's the same with any language. An RDBMS is relational storage; individual pieces of data are not intelligent. Object behavior is an application thing, not a database thing. A typical ORM (object-relational mapper) treats a class as a table and vice versa.
> SQLite is an embedded database engine, so when you need to make > customizations like this, nobody other than your application is going > to be using the DB. Since you're not modifying fundamental DBMS > functionality, you don't need to do anything to SQLite; just build a > wrapper in your application that uses BLOBs to store everything, and > build the rest of your application on top of that wrapper. > > The same approach will work with pretty much any other DBMS as well.
What I want is more efficient sorting and searching. Therefore I want tables that are optimized for sorting, searching, and storing a specific datatype. Using only BLOBS is a possible solution. But searching ie. a number within unsorted and variable size (from 1Bi to 2GiBi) BLOB table is not as efficient than searching in a specialized fixed-size-integer table.
I don't think we're on the same page here. You originally asked to store arbitrary data. If you need to store something that doesn't map to any other primitive, then you store it as a BLOB (or closest reasonable equivalent for the specific DBMS). What other option is there?
As example you can think about sorting geographical coordinates. In an classical RDBMS this would look like: CREATE TABLE Pos ( Pos_ID INT, lat FLOAT, long FLOAT )
Okay.
It is very easy and effective to sort this table for latitude and longitude. You can also create such a table automatically using XML Schema. But when you have an OWL ontology you can't create the table automatically anymore. You would have to write specialized proxy classes that are using SPARQL queries against the data to result in tables. The queries would be specialized on specific schemes. Now we come to the possibility to store this as BLOB. Because RDF can use any datatype you will have to store everything from booleans to images, music and videos. You can't make any good algorithms for sorting and searching that mess. Searching for a specific number would require to search the whole database in a serial manner and thus resuling in a O(N) searchtime, where N is the size of the whole database. The really advantage of using BLOBs is the speed while inserting data into the table, because there is no lookup needed to determine where to store the data. So my idea is to partition the data into separate tables. This will also result in a float table. Because you can sort this table you can also make a book search. Therefore you will result in a searchtime is about O(log(N)), while N is the number of integers rather than the size of the whole database.
You've lost me in terminology. I'll leave this to someone who knows what you're talking about.
You can realize such a thing by creating a table for each primitive datatype by hand and serialize the objects into this elementar datatypes and the semantics of the object. But because RDF has a very extensible Datatype system a approach that handles also user defined datatypes would boost the performance of the serialisation and unserialisation process.
At this point it sounds like you're talking about application concerns. What does this have to do with the DBMS? Specifically, serialization is something the application has to deal with regardless, since you have to transport it to the DBMS in a form it understands anyway. For instance, suppose you want to insert a 64 bit integer, that you already have in machine-native form. If you construct a straight SQL query to do that, you've just serialized it to ASCII for transport to the DB. If you use a DBMS-specific parametric binding API that accepts a native machine integer, then for a portable DB (either client/server style over a network, or portable-file style like SQLite) it has to be serialized into the DB's preferred form, such as MSB-first with a length attribute. For an object you have to go through that same process. Even with a "native format" API, the result you end up with is equivalent to a BLOB, and there's no performance gain from treating it as anything else. Trying to make the DB aware of it can make things worse, since you end up with another neutral serialization in an effort to make it application-agnostic. When you get to an embedded database like SQLite, there's not even any performance tradeoff to be had: serialization costs the same no matter which set of code does it. The reason some DBMS support custom data types is for situations when the database needs to know some things about the data in order to guarantee consistency when dealing with multiple applications, or for particular kinds of searches. For instance, IP addresses can be stored as plain integers, but implementing a custom datatype would let you teach the DBMS about CIDR prefix matching and searching. FWIW, SQLite does not support custom datatypes at all. It understands a handful of primitives and nothing else.
Further speedup would be possible to partition the tables of the primitive types even further. Ie. there might be a partition that holds ie. floats for latitute and another partition for holding floats for longitude. This approach might reduce the searching time almost to the time a search in a specialised relational table takes. In the example of coordinates the RDBMS will be still faster because you get longitude and latitude with only one search rather than having the need of making multiple searches. As example I will use the search for points that located in a specific area. In the RDBMS you will have to (1) simply access the points table (2) [presort for longitude and] search all points where the longitude is between two values and save the result in a temporary table and (3) [presort the latitude and] search all of the resulting points having its latitude in a given range.
Here it almost sounds like you've gone overboard on theory, and left practical concerns behind somewhere. I'm not quite sure what your search example was meant to represent, exactly. If you mean what an application does, then it's misusing the RDBMS. If you meant what the RDBMS does internally, it's also wrong for any modern system. You use the DBMS by giving it a query that says what you want: both latitude and longitude between two values. When the DBMS knows you like to perform these kinds of searches (usually because you created a compound index), what it does may be logically equivalent to but is a little more efficient than creating a temporary table. (As a side note, if coordinate handling is an actual need of yours, look for info on GIS-friendly systems. Multidimensional searching has specialized tools.)
The question is if I can use SQLite for realizing such a database layout.
I don't think we're talking about the same things yet, so I don't know! ----------------------------------------------------------------------------- To unsubscribe, send email to [EMAIL PROTECTED] -----------------------------------------------------------------------------