Hi guys, Are your changes/bug fixes reflected in the Spark 2.1 release? Iman
On Dec 2, 2016 3:03 PM, "Iman Mohtashemi" <iman.mohtash...@gmail.com> wrote: > Thanks again! This is very helpful! > Best regards, > Iman > > On Dec 2, 2016 2:49 PM, "Huamin Li" <3eri...@gmail.com> wrote: > >> Hi Iman, >> >> You can get my code from https://github.com/hl475/svd/tree/testSVD. In >> additional to fix the index issue for IndexedRowMatrix ( >> https://issues.apache.org/jira/browse/SPARK-8614), I have made some the >> following changes as well: >> >> (1) Add tallSkinnySVD and computeSVDbyGram to indexedRowMatrix. >> (2) Add shuffle.scala to mllib/src/main/scala/org/apach >> e/spark/mllib/linalg/distributed/ (you need this if you want to use >> tallSkinnySVD). There was a bug about shuffle method in breeze, and I sent >> the pull request to https://github.com/scalanlp/breeze/pull/571. >> However, the pull request has been merged to breeze 0.13, whereas the >> version of breeze for current Spark is 0.12. >> (3) Add partialSVD to BlockMatrix which computes the randomized singular >> value decomposition of a given BlockMatrix. >> >> The new SVD methods (tallSkinnySVD, computeSVDbyGram, and partialSVD) are >> in beta version right now. You are totally welcome to test it and share the >> feedback with me! >> >> I implemented these codes for my summer intern project with Mark Tygert, >> and we are currently testing the performance of the new codes. >> >> Best, >> Huamin >> >> On Fri, Dec 2, 2016 at 2:07 PM, Iman Mohtashemi < >> iman.mohtash...@gmail.com> wrote: >> >>> Great thanks! Where can I get the latest with the bug fixes? >>> best regards, >>> Iman >>> >>> On Fri, Dec 2, 2016 at 10:54 AM Huamin Li <3eri...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> There seems to be a bug in the section of code that converts the >>>> RowMatrix format back into indexedRowMatrix format. >>>> >>>> For RowMatrix, I think the singular values and right singular vectors >>>> (not the left singular vectors U) that computeSVD computes are correct when >>>> using multiple executors/machines; Only the R (not the Q) in tallSkinnyQR >>>> is correct when using multiple executors/machines. U and Q were being >>>> stored in RowMatrix format. There is no index information about RowMatrix, >>>> so it does not make sense for U and Q. >>>> >>>> Others have run into this same problem (https://issues.apache.org/jir >>>> a/browse/SPARK-8614) >>>> >>>> I think the quick solution for this problem is copy and paste the multiply, >>>> computeSVD, and tallSkinnyQR code from RowMatrix to IndexedRowMatrix >>>> and make the corresponding changes although this would result in code >>>> duplication. >>>> >>>> I have fixed the problem by what I mentioned above. Now, multiply, >>>> computeSVD, and tallSkinnyQR are giving the correct results for >>>> indexedRowMatrix when using multiple executors or workers. Let me know >>>> if I should do a pull request for this. >>>> >>>> Best, >>>> Huamin >>>> >>>> On Fri, Dec 2, 2016 at 11:23 AM, Iman Mohtashemi < >>>> iman.mohtash...@gmail.com> wrote: >>>> >>>> Ok thanks. >>>> >>>> On Fri, Dec 2, 2016 at 8:19 AM Sean Owen <so...@cloudera.com> wrote: >>>> >>>> I tried, but enforcing the ordering changed a fair bit of behavior and >>>> I gave up. I think the way to think of it is: a RowMatrix has whatever >>>> ordering you made it with, so you need to give it ordered rows if you're >>>> going to use a method like the QR decomposition. That works. I don't think >>>> the QR method should ever have been on this class though, for this reason. >>>> >>>> On Fri, Dec 2, 2016 at 4:13 PM Iman Mohtashemi < >>>> iman.mohtash...@gmail.com> wrote: >>>> >>>> Hi guys, >>>> Was this bug ever resolved? >>>> Iman >>>> >>>> On Fri, Nov 11, 2016 at 9:59 AM Iman Mohtashemi < >>>> iman.mohtash...@gmail.com> wrote: >>>> >>>> Yes this would be helpful, otherwise the Q part of the decomposition is >>>> useless. One can use that to solve the system by transposing it and >>>> multiplying with b and solving for x (Ax = b) where A = R and b = Qt*b >>>> since the Upper triangular matrix is correctly available (R) >>>> >>>> On Fri, Nov 11, 2016 at 3:56 AM Sean Owen <so...@cloudera.com> wrote: >>>> >>>> @Xiangrui / @Joseph, do you think it would be reasonable to have >>>> CoordinateMatrix sort the rows it creates to make an IndexedRowMatrix? in >>>> order to make the ultimate output of toRowMatrix less surprising when it's >>>> not ordered? >>>> >>>> >>>> On Tue, Nov 8, 2016 at 3:29 PM Sean Owen <so...@cloudera.com> wrote: >>>> >>>> I think the problem here is that IndexedRowMatrix.toRowMatrix does >>>> *not* result in a RowMatrix with rows in order of their indices, >>>> necessarily: >>>> >>>> >>>> // Drop its row indices. >>>> RowMatrix rowMat = indexedRowMatrix.toRowMatrix(); >>>> >>>> What you get is a matrix where the rows are arranged in whatever order >>>> they were passed to IndexedRowMatrix. RowMatrix says it's for rows where >>>> the ordering doesn't matter, but then it's maybe surprising it has a QR >>>> decomposition method, because clearly the result depends on the order of >>>> rows in the input. (CC Yuhao Yang for a comment?) >>>> >>>> You could say, well, why doesn't IndexedRowMatrix.toRowMatrix return at >>>> least something with sorted rows? that would not be hard. It also won't >>>> return "missing" rows (all zeroes), so it would not in any event result in >>>> a RowMatrix whose implicit rows and ordering represented the same matrix. >>>> That, at least, strikes me as something to be better documented. >>>> >>>> Maybe it would be nicer still to at least sort the rows, given the >>>> existence of use cases like yours. For example, at least >>>> CoordinateMatrix.toIndexedRowMatrix could sort? that is less >>>> surprising. >>>> >>>> In any event you should be able to make it work by manually getting the >>>> RDD[IndexedRow] out of IndexedRowMatrix, sorting by index, then mapping it >>>> to Vectors and making a RowMatrix from it. >>>> >>>> >>>> >>>> On Tue, Nov 8, 2016 at 2:41 PM Iman Mohtashemi < >>>> iman.mohtash...@gmail.com> wrote: >>>> >>>> Hi Sean, >>>> Here you go: >>>> >>>> sparsematrix.txt = >>>> >>>> row, col ,val >>>> 0,0,.42 >>>> 0,1,.28 >>>> 0,2,.89 >>>> 1,0,.83 >>>> 1,1,.34 >>>> 1,2,.42 >>>> 2,0,.23 >>>> 3,0,.42 >>>> 3,1,.98 >>>> 3,2,.88 >>>> 4,0,.23 >>>> 4,1,.36 >>>> 4,2,.97 >>>> >>>> The vector is just the third column of the matrix which should give the >>>> trivial solution of [0,0,1] >>>> >>>> This translates to this which is correct >>>> There are zeros in the matrix (Not really sparse but just an example) >>>> 0.42 0.28 0.89 >>>> 0.83 0.34 0.42 >>>> 0.23 0.0 0.0 >>>> 0.42 0.98 0.88 >>>> 0.23 0.36 0.97 >>>> >>>> >>>> Here is what I get for the Q and R >>>> >>>> Q: -0.21470961288429483 0.23590615093828807 0.6784910613691661 >>>> -0.3920784235278427 -0.06171221388256143 0.5847874866876442 >>>> -0.7748216464954987 -0.4003560542230838 -0.29392323671555354 >>>> -0.3920784235278427 0.8517909521421976 -0.31435038559403217 >>>> -0.21470961288429483 -0.23389547730301666 -0.11165321782745863 >>>> R: -1.0712142642814275 -0.8347536340918976 -1.227672225670157 >>>> 0.0 0.7662808691141717 0.7553315911660984 >>>> 0.0 0.0 0.7785210939368136 >>>> >>>> When running this in matlab the numbers are the same but row 1 is the >>>> last row and the last row is interchanged with row 3 >>>> >>>> >>>> >>>> On Mon, Nov 7, 2016 at 11:35 PM Sean Owen <so...@cloudera.com> wrote: >>>> >>>> Rather than post a large section of code, please post a small example >>>> of the input matrix and its decomposition, to illustrate what you're saying >>>> is out of order. >>>> >>>> On Tue, Nov 8, 2016 at 3:50 AM im281 <iman.mohtash...@gmail.com> wrote: >>>> >>>> I am getting the correct rows but they are out of order. Is this a bug >>>> or am >>>> I doing something wrong? >>>> >>>> >>>> >>>> >>