Ups sorry for pasting in that link to the MLUG ... that was an unintended mixup as I was writing 2 emails at once :( But since the cat's out of the bag ... anyone in the area is welcome to come !
From: [email protected] [mailto:[email protected]] On Behalf Of David Lee Sent: Wednesday, January 01, 2014 12:40 PM To: jean-marc Mercier Cc: [email protected]; ihe onwuka; Michael Sokolov; Adam Retter; Andrew Welch Subject: Re: [xquery-talk] Matrix Multiplication A big performance difference you will see between XQuery and say C++ is that the interpretation of the FLOW statements can overwhelm the performance of individual operations. Differences between XQuery implementations can be vast. This is for many reasons, one being that it is often optimized not for calculations of this sort but for XPath expressions or database indexed retrieval. Another issue is that different XQuery implementations have different infrastructure goals or requirements. MarkLogic, for example, has a lot of requirements around transactional processing, performance measuring, lazy evaluations of huge data structures etc. This adds overhead to every statement that may not be in other XQuery implementations which don't have those goals or requirements. Some XQuery engines are designed to work 100% In memory ... that can lead to optimizations which are impossible for implementations designed to work on datasets largely out of memory ... There are just a vast amount of variables that affect things. http://www.meetup.com/baymug/events/150734492/m In any case it would be very surprising to me if ANY XQuery implementation was as fast as hand coded C++ for things like looping over an array. Its simply not the use case XQuery was designed for or typically optimized for. Although the same used to be true of Java. Now java compilers have become so good that they can run at nearly "native" speeds for some things ... Saxon XSLT EE (And I believe XQuery) has some similar technologies to generate JVM bytecode on demand which can speed things up significantly (or slow them down !). In the end though ... if your looking for fast linear algebra libraries I suspect no matter what language you choose you will find that nothing will beat a native implementation ... This is true even with C++ or Fortran where library vendors optimize things like matrix multiplication using assembly or even GPUs ... Eventually someone puts the effort into wrapping those up into nice high level functions that you can call from a higher level language - but are rarely *written entirely* in that language. Simple example from XQuery world. fn:sum() is not likely to be written in XQuery itself. From: jean-marc Mercier [mailto:[email protected]] Sent: Wednesday, January 01, 2014 12:22 PM To: David Lee Cc: Adam Retter; [email protected]<mailto:[email protected]>; Andrew Welch; ihe onwuka; Michael Sokolov Subject: Re: [xquery-talk] Matrix Multiplication @David, Thx for the links. Indeed I thought that Marklogic DB was not free to use ! I will give it a try. You are right: precising maps implementation is important, since it can change a lot performances issues. Concerning your profile tests for maps, a quick answer. If your numbers are correct, the BaseX map seems to be 10x faster than MarkLogic one for 1000000 insertions, but behave as n log(n) (it is based over Phil Bagwell's hash tries AFAIK), I will try to profile MarkLogic map more precisely. This might be due to the hash function ? Note that it is interesting to have imperative language performances in mind. A standard C++ std::map takes about 0.5 sec to insert 2^22 (i.e. 4 000 000) on my PC, single thread. As a rule of thumb, a C++ std::map (that is a basic red/black tree AFAIK) should be around 25 x faster for insertion than BaseX one. 2014/1/1 David Lee <[email protected]<mailto:[email protected]>> @jean-marc "@David, indeed, I am a little bit reluctant to use non standardized tools. Anyhow, I am quite surprised by your sentence : there exists map containers working in linear access ? Do you have any references to point out ? " Just for reference, and to disclose my goal is not to push a particular vendor but to make the point that to make generalized statements about performance characteristics of generic data types without respect to the implementation is not wise. Having done performance work professionally for decades I can attest this is a general rule - "It Depends". But for reference here is one data point demonstrating linear access to both maps and arrays in XQuery (using a vendor extension). Using the developer (free for development use) version of MarkLogic 7.0.1 Downloaded from here: http://developer.marklogic.com/products On my windows desktop (a fairly high end machine, but not unusual by any means). map test: using map:map object http://docs.marklogic.com/map:map?q=map:map I did not have the code return any results so that the timing was affected by the result size. You are welcome to try for yourself to validate that the optimizer was not so incredible as to actually optimize away the code (or it would have taken constant time if it realized the result was always () ). This was just quickly done in "Query Console" using the built in profiler. Slightly better tests would be to invoke this as a stored XQuery module to avoid the interaction of the profiler and javascript console, but you can see within reason the results are linear. Note this also incurs a number to string conversion because map:map keys are strings. also note in this case the map was not pre-sized, it grows on demand. tested by incrementing $max from 100 to 10000000 in factors of 10 ------- XQuery code declare variable $map := map:map(); declare variable $max := 10000000; for $i in 1 to $max return map:put( $map , ""||$i , $i ), for $i in 1 to $max let $_ := map:get( $map , ""||$i ) return () Results 100 .000/.001 sec - limits of the time precision used 1000 .005 sec 10000 .047 sec 100000 .46 sec 1000000 4.8 sec 10000000 49.49 sec Array test using json:array http://docs.marklogic.com/json:array?q=json:array Note in this case I do pre-size the array to $max to avoid a resize. Its a minor optimization but useful. ------ XQuery Code declare variable $max := 10000000; declare variable $array := json:to-array( () , $max ); for $i in 1 to $max return $array[$i] = $i , for $i in 1 to $max let $_ := $array[$i] return () --------- Results 100 - under timer percision 1000 .002 sec 10000 .02 sec 100000 .23 sec 1000000 2.06 sec 10000000 20.65 sec From: jean-marc Mercier [mailto:[email protected]<mailto:[email protected]>] Sent: Wednesday, January 01, 2014 11:05 AM To: David Lee Cc: Adam Retter; [email protected]<mailto:[email protected]>; Andrew Welch; ihe onwuka; Michael Sokolov Subject: Re: [xquery-talk] Matrix Multiplication @David, indeed, I am a little bit reluctant to use non standardized tools. Anyhow, I am quite surprised by your sentence : there exists map containers working in linear access ? Do you have any references to point out ? 2014/1/1 David Lee <[email protected]<mailto:[email protected]>> On 31 Dec 2013 17:03, "jean-marc Mercier" <[email protected]<mailto:[email protected]>> wrote: @David pairs are also basically needed to write a linear algebra modulus, the topic of this thread. And XQUERY don't provide any efficient pair. You can't use Marklogic map, or any other vendor map to store vectors for performance issues (a map is really slow). http://x-query.com/mailman/listinfo/talk [DAL:] Just an FYI but MarkLogic also have native arrays (in addition to maps) which are extremely efficient (they are stored internally as C arrays). But if you have to do a lot of iterating through the arrays - even though the accessors are very efficient , the surrounding FLOWR code is still XQuery which slows things down a bit. Maybe or maybe not enough to make them not useful for you. Also the marklogic maps are not the same as XQuery 3 implemented maps, they are a hash map under the hood and typically linear access. But back to your original issue, these are vendor extensions and hence not portable to other implementations (Until XQuery itself standardizes on arrays and maps - then it is likely that vendor extension implementations will be used to expose the standard interface). If what you want is pure XQuery ... it doesnt matter how fast these are if you cant use them because they dont exist in all implementations. -David
_______________________________________________ [email protected] http://x-query.com/mailman/listinfo/talk
