> On Jun 15, 2019, at 6:42 AM, Dan Kaminsky <dan.kamin...@medal.com> wrote: > > One of the more useful and usable packages for Natural Language > Processing, Magnitude[1], leverages SQLite to efficiently handle the real > valued but entirely abstract collections of numbers -- vector spaces -- > that modern machine learning depends on.
I'm somewhat familiar with this, having recently written some code that stores ML results in SQLite databases. As far as I know, there is no benefit to storing each element of such a vector as a separate column in SQLite. Instead, the entire vector should be stored as a single blob — for example, as a concatenation of 3072 IEEE floats in some fixed byte-order. I say this because I don't know of any reason why a SQL query would need to access a specific vector coordinate, e.g. "SELECT * FROM vectors WHERE item_1722 > 0.5". The interesting operations on these vectors apply to all the coordinates in aggregate, like dot products or various distance metrics, and writing these out in SQL would be extremely verbose and extremely slow (because the query engine is an interpreter.) Instead you'd implement a native function that used CPU vector instructions to do the math quickly, and register that function with SQLite, passing it the vector as a blob. —Jens _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users