Re: Mahout vectors/matrices/solvers on spark

2013-07-09 Thread Ted Dunning
It is common for double serialization to creep into the systems as well. My guess however is that the primitive serialization is just much faster than the vector serialization. Sent from my iPhone On Jul 8, 2013, at 22:55, Dmitriy Lyubimov dlie...@gmail.com wrote: yes, but it is just a

Re: Mahout vectors/matrices/solvers on spark

2013-07-09 Thread Dmitriy Lyubimov
yes that's my working hypothesis. Serializing and combining RandomAccessSparseVectors is slower than elementwise messages. On Mon, Jul 8, 2013 at 11:00 PM, Ted Dunning ted.dunn...@gmail.com wrote: It is common for double serialization to creep into the systems as well. My guess however is

Re: Mahout vectors/matrices/solvers on spark

2013-07-09 Thread Ted Dunning
Also, it is likely that the combiner has little effect. This means that you are essentially using a vector to serialized single elements. Sent from my iPhone On Jul 8, 2013, at 23:13, Dmitriy Lyubimov dlie...@gmail.com wrote: yes that's my working hypothesis. Serializing and combining

Re: Mahout vectors/matrices/solvers on spark

2013-07-09 Thread Dmitriy Lyubimov
that has occurred to me too. we are not inferring any aggregations really here. it may turn out that its use beneficial with bigger volumes and real I/O though. hard to tell. anyway i will probably keep both as an option. On Tue, Jul 9, 2013 at 7:51 AM, Ted Dunning ted.dunn...@gmail.com wrote:

Re: Mahout vectors/matrices/solvers on spark

2013-07-08 Thread Dmitriy Lyubimov
Anybody knows how good (or bad) our performance on matrix transpose? how long will it take to transpose a 10M non-zeros with Mahout (if i wanted to setup fully distributed but single node MR cluster?) Trying to figure if the numbers i see with Bagel-based Mahout matrix transposition are any good.

Re: Mahout vectors/matrices/solvers on spark

2013-07-08 Thread Ted Dunning
Transpose of that small a matrix should happen in memory. Sent from my iPhone On Jul 8, 2013, at 17:26, Dmitriy Lyubimov dlie...@gmail.com wrote: Anybody knows how good (or bad) our performance on matrix transpose? how long will it take to transpose a 10M non-zeros with Mahout (if i wanted

Re: Mahout vectors/matrices/solvers on spark

2013-07-08 Thread Dmitriy Lyubimov
yes, but it is just a test and I am trying to interpolate results that i see to bigger volume. sort of. To get some taste of the programming model performance. I do get cpu-bound behavior and i hit spark cache 100% of the time. so i theory, since i am not having spills and i am not doing sorts,

Re: Mahout vectors/matrices/solvers on spark

2013-07-05 Thread Dmitriy Lyubimov
Ted, would it make sense to port parts of QR in-core row-wise Givens solver out of SSVD to work on any Matrix? I know givens method is advertised as stable but not sure if it is the fastest accepted one. I guess they are all about the same. If yes, i will need also to port the UpperTriangular

Re: Mahout vectors/matrices/solvers on spark

2013-07-05 Thread Dmitriy Lyubimov
FWIW, Givens streaming qr will be a bit more economical on memory than Householder's since it doesn't need the full buffer to compute R and doesn't need to keep entire original matrix around. On Thu, Jul 4, 2013 at 11:15 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Ted, would it make sense

Re: Mahout vectors/matrices/solvers on spark

2013-07-05 Thread Dmitriy Lyubimov
For anyone good at scala DSLs, the following is the puzzle i can't seem to figure at the moment. I mentioned before that I implemented assignment notations to a row or a block, e.g. for a row vector : A(5,::) := (1,2,3) what it really translates into in this particular case is

Re: Mahout vectors/matrices/solvers on spark

2013-07-05 Thread Jake Mannix
On Fri, Jul 5, 2013 at 1:15 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: For anyone good at scala DSLs, the following is the puzzle i can't seem to figure at the moment. I mentioned before that I implemented assignment notations to a row or a block, e.g. for a row vector : A(5,::) :=

Re: Mahout vectors/matrices/solvers on spark

2013-07-05 Thread Ted Dunning
On Fri, Jul 5, 2013 at 1:25 AM, Jake Mannix jake.man...@gmail.com wrote: at this point i have only a very obvious apply(Double,Double):Double = m.getQuick(...), i.e. only element reads are supported with that syntax. I am guessing Jake, if anyone, might have an idea here... thanks.

Re: Mahout vectors/matrices/solvers on spark

2013-07-05 Thread Nick Pentreath
Hi Dmitry ​You can take a look at using the update magic method which is similar to apply but handles assignment.  ​If you want to keep the := as assignment I think you could do  def :=(value: Double) = update ... (I don't have my laptop around at the moment so can't check this works).

Re: Mahout vectors/matrices/solvers on spark

2013-07-05 Thread Dmitriy Lyubimov
On Fri, Jul 5, 2013 at 1:40 AM, Nick Pentreath nick.pentre...@gmail.comwrote: Hi Dmitry You can take a look at using the update magic method which is similar to apply but handles assignment. If you want to keep the := as assignment I think you could do def :=(value: Double) = update

Re: Mahout vectors/matrices/solvers on spark

2013-07-04 Thread Ted Dunning
This is pretty exciting! Thanks Dmitriy. On Wed, Jul 3, 2013 at 10:12 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Excellent! so I guess SSVD can be divorced from apache-math solver then. Actually it all shaping up surprisingly well, with scala DSL for both in-core and mahout DRMS and

Re: Mahout vectors/matrices/solvers on spark

2013-07-03 Thread Ted Dunning
On Wed, Jul 3, 2013 at 6:25 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: On Wed, Jun 19, 2013 at 12:20 AM, Ted Dunning ted.dunn...@gmail.com wrote: As far as in-memory solvers, we have: 1) LR decomposition (tested and kinda fast) 2) Cholesky decomposition (tested) 3) SVD

Re: Mahout vectors/matrices/solvers on spark

2013-07-03 Thread Dmitriy Lyubimov
Excellent! so I guess SSVD can be divorced from apache-math solver then. Actually it all shaping up surprisingly well, with scala DSL for both in-core and mahout DRMS and spark solvers. I haven't been able to pay as much attention to this as i hoped due to being pretty sick last month. But even

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Dmitriy Lyubimov
Ok, so i was fairly easily able to build some DSL for our matrix manipulation (similar to breeze) in scala: inline matrix or vector: val a = dense((1, 2, 3), (3, 4, 5)) val b:Vector = (1,2,3) block views and assignments (element/row/vector/block/block of row or vector) a(::, 0) a(1, ::) a(0

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Ted Dunning
Dmitriy, This is very pretty. On Mon, Jun 24, 2013 at 6:48 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Ok, so i was fairly easily able to build some DSL for our matrix manipulation (similar to breeze) in scala: inline matrix or vector: val a = dense((1, 2, 3), (3, 4, 5)) val

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Jake Mannix
Yeah, I'm totally on board with a pretty scala DSL on top of some of our stuff. In particular, I've been experimenting with with wrapping the DistributedRowMatrix in a scalding wrapper, so we can do things like val matrixAsTypedPipe = DistributedRowMatrixPipe(new DistributedRowMatrix(numRows,

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Nick Pentreath
That looks great Dmitry!  ​The thing about Breeze that drives the complexity in it is partly specialization for Float, Double and Int matrices, and partly getting the syntax to just work for all combinations of matrix types and operands etc. mostly it does just work but occasionally not.

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Dmitriy Lyubimov
On Mon, Jun 24, 2013 at 1:46 PM, Nick Pentreath nick.pentre...@gmail.comwrote: That looks great Dmitry! The thing about Breeze that drives the complexity in it is partly specialization for Float, Double and Int matrices, and partly getting the syntax to just work for all combinations of

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Ted Dunning
I think that contrib modules would be very interesting. Specifically, good Scala DSL, pig integration and so on. On Mon, Jun 24, 2013 at 9:55 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: On Mon, Jun 24, 2013 at 1:46 PM, Nick Pentreath nick.pentre...@gmail.com wrote: That looks great

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Nick Pentreath
You're right on that - so far doubles is all I've needed and all I can currently see needing.  ​I'll take a look at your project and see how easy it is to integrate with my Spark ALS and other code - syntax wise it looks almost the same so swapping out the linear algebra backend would be

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Dmitriy Lyubimov
Well one fundamental step to get there in Mahout realm, the way i see it, is to create DSLs for Mahout's DRMs in spark. That's actually one of the other reasons i chose not to follow Breeze. When we unwind Mahout DRM's, we may see sparse or dense slices there with named vectors. To translate that

Re: Mahout vectors/matrices/solvers on spark

2013-06-23 Thread Dmitriy Lyubimov
What does Matrix. iterateAll() contractually do? Practically it seems to be row wise iteration for some implementations but it doesnt seem contractually state so in the javadoc. What is MatrixSlice if it is neither a row nor a colimn? How can i tell what exactly it is i am iterating over? On Jun

Re: Mahout vectors/matrices/solvers on spark

2013-06-23 Thread Ted Dunning
I think that this contract has migrated a bit from the first starting point. My feeling is that there is a de facto contract now that the matrix slice is a single row. Sent from my iPhone On Jun 23, 2013, at 16:32, Dmitriy Lyubimov dlie...@gmail.com wrote: What does Matrix. iterateAll()

Re: Mahout vectors/matrices/solvers on spark

2013-06-23 Thread Dmitriy Lyubimov
Thank you. On Jun 23, 2013 6:16 PM, Ted Dunning ted.dunn...@gmail.com wrote: I think that this contract has migrated a bit from the first starting point. My feeling is that there is a de facto contract now that the matrix slice is a single row. Sent from my iPhone On Jun 23, 2013, at

Re: Mahout vectors/matrices/solvers on spark

2013-06-19 Thread Sebastian Schelter
Let us know how I went, I'm pretty interested to see how well our stuff integrates with Spark, especially since Spark is in the process of joining Apache. -sebastian On 19.06.2013 03:14, Dmitriy Lyubimov wrote: Hello, so i finally got around to actually do it. I want to get Mahout sparse

Re: Mahout vectors/matrices/solvers on spark

2013-06-19 Thread Ted Dunning
On Wed, Jun 19, 2013 at 5:29 AM, Jake Mannix jake.man...@gmail.com wrote: Question #2: which in-core solvers are available for Mahout matrices? I know there's SSVD, probably Cholesky, is there something else? In paticular, i need to be solving linear systems, I guess Cholesky should be

Re: Mahout vectors/matrices/solvers on spark

2013-06-19 Thread Nick Pentreath
Hi Dmitriy I'd be interested to look at helping with this potentially (time permitting). I've recently been working on a port of Mahout's ALS implementation to Spark. I spent a bit of time thinking about how much of mahout-math to use. For now I found that using the Breeze linear algebra

Re: Mahout vectors/matrices/solvers on spark

2013-06-19 Thread Sebastian Schelter
I have a JBlas version of our ALS solving code lying around [1], feel free to use it. Would also be interested to see the Spark port. -sebastian [1] https://github.com/sscdotopen/mahout-als/blob/jblas/math/src/main/java/org/apache/mahout/math/als/JBlasAlternatingLeastSquaresSolver.java On

Re: Mahout vectors/matrices/solvers on spark

2013-06-19 Thread Dmitriy Lyubimov
Thank you, Ted. On Wed, Jun 19, 2013 at 12:20 AM, Ted Dunning ted.dunn...@gmail.com wrote: On Wed, Jun 19, 2013 at 5:29 AM, Jake Mannix jake.man...@gmail.com wrote: Question #2: which in-core solvers are available for Mahout matrices? I know there's SSVD, probably Cholesky, is there

Re: Mahout vectors/matrices/solvers on spark

2013-06-19 Thread Dmitriy Lyubimov
Thank you, Sebastian. Actually ALS flavours are indeed one of my first pragmatic goals -- i have also done a few customization for my employer -- so i probably will pragmatically pursue those customizations first. In particular, i do use Koren-Volinsky confidence weighting, but assume we still

Re: Mahout vectors/matrices/solvers on spark

2013-06-19 Thread Dmitriy Lyubimov
Nick, thank you for the hints and poniters! I will check out the Breeze. Let me take a look. as far as collaboration, unfortunately i think the only way to go for me and my employer is to cut it, test it and then (after long negotiations with CEO) donate if accepted. They are ok with my small

Mahout vectors/matrices/solvers on spark

2013-06-18 Thread Dmitriy Lyubimov
Hello, so i finally got around to actually do it. I want to get Mahout sparse vectors and matrices (DRMs) and rebuild some solvers using spark and Bagel /scala. I also want to use in-core solvers that run directly on Mahout. Question #1: which mahout artifacts are better be imported if I don't

Re: Mahout vectors/matrices/solvers on spark

2013-06-18 Thread Jake Mannix
On Tue, Jun 18, 2013 at 6:14 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Hello, so i finally got around to actually do it. I want to get Mahout sparse vectors and matrices (DRMs) and rebuild some solvers using spark and Bagel /scala. I also want to use in-core solvers that run directly

Re: Mahout vectors/matrices/solvers on spark

2013-06-18 Thread Dmitriy Lyubimov
Thank you, Jake. I suspected as much about Colt. On Jun 18, 2013 8:30 PM, Jake Mannix jake.man...@gmail.com wrote: On Tue, Jun 18, 2013 at 6:14 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Hello, so i finally got around to actually do it. I want to get Mahout sparse vectors and