@jean-marc I am not disagreeing with your results, with pure XQuery (without vendor extensions) I believe you will not come close to code written in C or even Java for things like maps and vectors. But you yourself say that the map from BaseX is 100X faster than one developed in pure XQuery. Yet you *can use this map in XQuery* ... That's the whole point of vendor extensions is to overcome limitations in languages where appropriate. This is not unique to XQuery. I have not come across a language yet that doesn't have either vendor extensions, or libraries written in "native" code (lower level code then the language itself). Even in C many standard library functions are actually optimized either by hand written assembly or compiler insertion to improve efficiencies of the pure language. Furthermore, you have not tried all vendors, I discourage you from extrapolating from some vendors to all vendor implementations. I am not suggesting you try all vendors (that would take a long time) only that generalizing statements about performance cannot be entirely accurate without doing so.
But I do agree if your goal is to avoid any vendor extensions and provide a "100% Pure XQuery" module for functions that need to make great use of maps and vectors, XQuery is likey to not be as performant as languages which have the data types you need built in. ( which in essence is saying that those languages native types are the equivalent to XQuery vendor extensions except they are defined by the language spec instead of the vendor spec. ). Also as a tiny sub-diversion, I do disagree with this generalized statement " This is true for any implementation, any language : if you use a map instead of a vector, the overhead for each insertion or reading is log(n), where n is the size of the vector" This is entirely dependent on the implementation of the map and vector. I assert you cannot make such generalized assentation's. For example, a map based on a hash instead of a tree can be linear time until it reaches some internal sizing. Performance in real life is never as simple as on paper. In your case you created a map based on a tree. That will be roughly log(n), true. But that is not the only possible implementation of maps. In the end though ... for the cases you are trying for, I suggest 100% pure XQuery will not provide the performance you need. So your choice is to use a different language, or to find an implementation of XQuery which has vendor extensions customized for the data types and operations not in "100% pure XQuery". From: jean-marc Mercier [mailto:[email protected]] Sent: Wednesday, January 01, 2014 4:51 AM To: David Lee Cc: Michael Sokolov; xquery-discuss; ihe onwuka; Andrew Welch Subject: Re: [xquery-talk] Matrix Multiplication @David Maybe my statement was not clear. Please let me do it again : I am just saying that maps are slow in insertion, compared to vectors. This is true for any implementation, any language : if you use a map instead of a vector, the overhead for each insertion or reading is log(n), where n is the size of the vector, that is huge when you look primarily for performances. Furthermore, please check this thread: 3 of us, including me, have tried to develop an external modulus implementing a map. The best result, that is the one of Leo from BaseX, consists in a 100X overhead factor compared to a vendor solution, the map from BaseX. So far we tried, so far we failed: this failure is just telling us that we can't use XQUERY to develop basic types as pairs, maps, vectors through external XQUERY modulus. In this context, I would not try, and will discourage anyone else, to develop an external linear Algebra modulus with this version of XQUERY, because it will be too slow compared to imperative language already existing linear algebra modulus. David, I really try to have a scientific approach, I am not bound to any organization, have no particular private interest in the xml industry. I am just a xquery user. And I wish I were wrong. Please do not hesitate to point me out any failure in my reasoning. Meanwhile, I wish you a happy new year !!! 2013/12/31 David Lee <[email protected]<mailto:[email protected]>> One tiny correction Your statement " You can't use Marklogic map, or any other vendor map to store vectors for performance issues (a map is really slow)." So far on this thread this statement has not been substantiated. MarkLogic maps and arrays are quite efficient, they are not the same thing as XQuery rb trees or anything else. They are highly efficient C++ implementations of maps and arrays, implemented under the hood with the same data structures as you would in C++ stdlib. Not suggesting your other line of reasoning is incorrect, only that so far on this thread I have not seen any evidence of testing vendor extensions for XQuery types. I suspect you don't want to do that for good reasons (for example its not portable, or its not a free product), but I do suggest you cannot extrapolate performance from one vendors implementation to another. As is usual in performance "it depends". From: jean-marc Mercier [mailto:[email protected]<mailto:[email protected]>] Sent: Tuesday, December 31, 2013 12:02 PM To: David Lee Cc: Michael Sokolov; xquery-discuss; ihe onwuka; Andrew Welch Subject: Re: [xquery-talk] Matrix Multiplication @David pairs are also basically needed to write a linear algebra modulus, the topic of this thread. And XQUERY don't provide any efficient pair. You can't use Marklogic map, or any other vendor map to store vectors for performance issues (a map is really slow). Note that there are a lot of workaround : I am using direct JAVA binding or C++ binding from XQUERY for linear algebra not to pay a too heavy tribute to these issue performances. The point is simply to notice that XQUERY could be really good to write an efficient linear algebra modulus. But, due to these performance issues, nobody can write it. I just hope that the next XQUERY version will give the necessary container to write it. Meanwhile, nobody can contribute to XQUERY through external modulus using heavy algorithms, and that's just too bad. 2013/12/31 David Lee <[email protected]<mailto:[email protected]>> Indeed ... pair( (1,2,3) ) is just a function that returns a function as per my example. But to the point, you can either use vendor extensions (such as marklogic's json:array and map:map types) which have good support for efficient operations, or look to another language (<sigh>) You may find Scala more amenable to this type of programming. From: [email protected]<mailto:[email protected]> [mailto:[email protected]<mailto:[email protected]>] On Behalf Of jean-marc Mercier Sent: Tuesday, December 31, 2013 11:49 AM To: Michael Sokolov Cc: xquery-discuss; ihe onwuka; Andrew Welch Subject: Re: [xquery-talk] Matrix Multiplication @David, there are tricks to overcome sequence concatenations now. See for instance definition of a pair by Leo, John Snelson, or me : you can write pair( ( 1 , 2 , 3 ) , (4 5 , 6) ) to avoid sequence concatenation. I ve written also constant sized vectors using this trick : for instance NUPLE(1,(),<toto>,5), withh associated getters. The bad new is that these operations takes too much time with all the interpreters I have tried, and can't be used in heavy algorithms. 2013/12/31 jean-marc Mercier <[email protected]<mailto:[email protected]>> Hi @Michael Concerning your remark over the discussions I quoted, maps are the basic block for sparse linear algebra. Without performent "maps of nodes" (equivalent to std::map in C++) you will not be able to write any performant linear algebra modulus for sparse matrix. However, before even thinking to sparse matrix, operations on sequences as concatenations are the first "show-stop" to write a linear algebra modulus, since sequences are vectors. Another one is the lack of constant sized vectors (needed for basic dense matrix operations). @Ryan thx for these links, they are very interesting. Well, I am going to party ! Have a happy new year ! 2013/12/31 Michael Sokolov <[email protected]<mailto:[email protected]>> I would love to see some tests of pure XQuery implementations of both sparse and dense operations. I suspect performance of matrix multiply, inversion, etc, will be poorer than in C++ or Java, but I would expect performance comparable to Perl or Python (w/o its numpy extension) - just a wild guess. I'd also expect it might be easier to get good sparse performance. I don't know why, just a hunch. But the more interesting question for me is whether language changes are really needed to support this. I would have thought that proper optimization of operations on sequences would be enough? So for example, an extension module using sequences as matrix datatypes could possibly optimize performance using a lower-level implementation. Does anyone see any reason why that wouldn't be possible? -Mike PS: I reviewed the discussion you referred to, jean-marc, but it seems to have more to do with using functions as map keys, and I don't see any direct connection to linear algebra? On 12/31/2013 9:55 AM, jean-marc Mercier wrote: It is not due to the spec. It is rather due to the common usage of XQUERY, forcing vendor solutions (as BaseX) to focus primarily on XML Data Base requests more than algorithmic performances. There are actually some threads that are discussing these performance issues in the context of maps (maps are for instance used for sparse matrix representations) : look for instance to ""map module for XQUERY ?" that I initiated or "Higher-order XQuery Modules" by Leo from BaseX, on [email protected]<http://x-query.com> mailing list. Anyhow, to write a serious linear algebra modulus, the basic need is to have a vector containers. Unfortunately, XQUERY does not provide any performant vector containers at present time, and it is not possible to code them in pure XQUERY : I have tried, and more experienced xquery developpers than me have also tried, without success. We have to wait for the XQUERY version that will give us these containers, a decision to be taken by the W3C. 2013/12/31 Andrew Welch <[email protected]<mailto:[email protected]>> Are you saying the spec as it stands somehow forces all implementations to be 1000x slower, or just what you have observed in some particular implementation? On 31 Dec 2013 14:27, "jean-marc Mercier" <[email protected]<mailto:[email protected]>> wrote: > > As far as I understand, you want to write a linear algebra module using > XQUERY ? > If so, I opened a thread some months ago about this idea. My opinion today is > that this is a false good idea at present time. > > 1) XQUERY would be really good for writing concise, efficient linear algebra > modulus. > 2) However, I strongly recommend to wait a little bit for starting coding : > the current version of XQUERY (3.0) suffers from performance issues. A linear > algebra modulus written in XQUERY is expected to have performances > performances 1000 X slower than its corresponding C++ or JAVA (you can > measure it precisely). Any mathematician linear algebra modulus would > probably trashed your modulus after the first test. > > Hope this helps > > > > 2013/12/31 Ihe Onwuka <[email protected]<mailto:[email protected]>> >> >> Assuming a sparse representation it is about 4 lines of SQL. This is known >> not least because you can read enough articles and papers that discuss it >> and it optimises well. The obvious google search does not reveal any >> corresponding XQuery discussion, neither does it appear to have surfaced on >> this or the eXist mailing list (allowing for my deficient search skills). >> For something so "trivial" I thought that was rather strange. Hence I >> thought it would be prudent to ask before naively embarking on a 600k X >> 40k matrix multiplication. >> >> >> >> On Tue, Dec 31, 2013 at 11:31 AM, Andrew Welch >> <[email protected]<mailto:[email protected]>> wrote: >>> >>> >>> It should be pretty trivial... >>> >>> On 31 Dec 2013 11:07, "Ihe Onwuka" >>> <[email protected]<mailto:[email protected]>> wrote: >>> > >>> > Has anybody tried this in XQuery or if I am so foolish (not yet but give >>> > me time) would I be the courageous >>> > <culturalReference>http://www.youtube.com/watch?v=ik8JT2S-kBE</culturalReference> >>> > early adopter. >>> > >>> > >>> > _______________________________________________ >>> > [email protected]<mailto:[email protected]> >>> > http://x-query.com/mailman/listinfo/talk >> >> >> >> _______________________________________________ >> [email protected]<mailto:[email protected]> >> http://x-query.com/mailman/listinfo/talk > > _______________________________________________ [email protected]<mailto:[email protected]> http://x-query.com/mailman/listinfo/talk
_______________________________________________ [email protected] http://x-query.com/mailman/listinfo/talk
