> On Jun 15, 2019, at 6:42 AM, Dan Kaminsky <dan.kamin...@medal.com> wrote:
> 
> One of the more useful and usable packages for Natural Language
> Processing, Magnitude[1], leverages SQLite to efficiently handle the real
> valued but entirely abstract collections of numbers -- vector spaces --
> that modern machine learning depends on.

I'm somewhat familiar with this, having recently written some code that stores 
ML results in SQLite databases.

As far as I know, there is no benefit to storing each element of such a vector 
as a separate column in SQLite. Instead, the entire vector should be stored as 
a single blob — for example, as a concatenation of 3072 IEEE floats in some 
fixed byte-order.

I say this because I don't know of any reason why a SQL query would need to 
access a specific vector coordinate, e.g. "SELECT * FROM vectors WHERE 
item_1722 > 0.5". The interesting operations on these vectors apply to all the 
coordinates in aggregate, like dot products or various distance metrics, and 
writing these out in SQL would be extremely verbose and extremely slow (because 
the query engine is an interpreter.) Instead you'd implement a native function 
that used CPU vector instructions to do the math quickly, and register that 
function with SQLite, passing it the vector as a blob.

—Jens
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to