Hi guys,
Are your changes/bug fixes reflected in the Spark 2.1 release?
Iman

On Dec 2, 2016 3:03 PM, "Iman Mohtashemi" <iman.mohtash...@gmail.com> wrote:

> Thanks again! This is very helpful!
> Best regards,
> Iman
>
> On Dec 2, 2016 2:49 PM, "Huamin Li" <3eri...@gmail.com> wrote:
>
>> Hi Iman,
>>
>> You can get my code from https://github.com/hl475/svd/tree/testSVD. In
>> additional to fix the index issue for IndexedRowMatrix (
>> https://issues.apache.org/jira/browse/SPARK-8614), I have made some the
>> following changes as well:
>>
>> (1) Add tallSkinnySVD and computeSVDbyGram to indexedRowMatrix.
>> (2) Add shuffle.scala to mllib/src/main/scala/org/apach
>> e/spark/mllib/linalg/distributed/ (you need this if you want to use
>> tallSkinnySVD). There was a bug about shuffle method in breeze, and I sent
>> the pull request to https://github.com/scalanlp/breeze/pull/571.
>> However, the pull request has been merged to breeze 0.13, whereas the
>> version of breeze for current Spark is 0.12.
>> (3) Add partialSVD to BlockMatrix which computes the randomized singular
>> value decomposition of a given BlockMatrix.
>>
>> The new SVD methods (tallSkinnySVD, computeSVDbyGram, and partialSVD) are
>> in beta version right now. You are totally welcome to test it and share the
>> feedback with me!
>>
>> I implemented these codes for my summer intern project with Mark Tygert,
>> and we are currently testing the performance of the new codes.
>>
>> Best,
>> Huamin
>>
>> On Fri, Dec 2, 2016 at 2:07 PM, Iman Mohtashemi <
>> iman.mohtash...@gmail.com> wrote:
>>
>>> Great thanks! Where can I get the latest with the bug fixes?
>>> best regards,
>>> Iman
>>>
>>> On Fri, Dec 2, 2016 at 10:54 AM Huamin Li <3eri...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> There seems to be a bug in the section of code that converts the
>>>> RowMatrix format back into indexedRowMatrix format.
>>>>
>>>> For RowMatrix, I think the singular values and right singular vectors
>>>> (not the left singular vectors U) that computeSVD computes are correct when
>>>> using multiple executors/machines; Only the R (not the Q) in tallSkinnyQR
>>>> is correct when using multiple executors/machines. U and Q were being
>>>> stored in RowMatrix format. There is no index information about RowMatrix,
>>>> so it does not make sense for U and Q.
>>>>
>>>> Others have run into this same problem (https://issues.apache.org/jir
>>>> a/browse/SPARK-8614)
>>>>
>>>> I think the quick solution for this problem is copy and paste the multiply,
>>>> computeSVD, and tallSkinnyQR code from RowMatrix to IndexedRowMatrix
>>>> and make the corresponding changes although this would result in code
>>>> duplication.
>>>>
>>>> I have fixed the problem by what I mentioned above. Now, multiply,
>>>> computeSVD, and tallSkinnyQR are giving the correct results for
>>>> indexedRowMatrix when using multiple executors or workers. Let me know
>>>> if I should do a pull request for this.
>>>>
>>>> Best,
>>>> Huamin
>>>>
>>>> On Fri, Dec 2, 2016 at 11:23 AM, Iman Mohtashemi <
>>>> iman.mohtash...@gmail.com> wrote:
>>>>
>>>> Ok thanks.
>>>>
>>>> On Fri, Dec 2, 2016 at 8:19 AM Sean Owen <so...@cloudera.com> wrote:
>>>>
>>>> I tried, but enforcing the ordering changed a fair bit of behavior and
>>>> I gave up. I think the way to think of it is: a RowMatrix has whatever
>>>> ordering you made it with, so you need to give it ordered rows if you're
>>>> going to use a method like the QR decomposition. That works. I don't think
>>>> the QR method should ever have been on this class though, for this reason.
>>>>
>>>> On Fri, Dec 2, 2016 at 4:13 PM Iman Mohtashemi <
>>>> iman.mohtash...@gmail.com> wrote:
>>>>
>>>> Hi guys,
>>>> Was this bug ever resolved?
>>>> Iman
>>>>
>>>> On Fri, Nov 11, 2016 at 9:59 AM Iman Mohtashemi <
>>>> iman.mohtash...@gmail.com> wrote:
>>>>
>>>> Yes this would be helpful, otherwise the Q part of the decomposition is
>>>> useless. One can use that to solve the system by transposing it and
>>>> multiplying with b and solving for x  (Ax = b) where A = R and b = Qt*b
>>>> since the Upper triangular matrix is correctly available (R)
>>>>
>>>> On Fri, Nov 11, 2016 at 3:56 AM Sean Owen <so...@cloudera.com> wrote:
>>>>
>>>> @Xiangrui / @Joseph, do you think it would be reasonable to have
>>>> CoordinateMatrix sort the rows it creates to make an IndexedRowMatrix? in
>>>> order to make the ultimate output of toRowMatrix less surprising when it's
>>>> not ordered?
>>>>
>>>>
>>>> On Tue, Nov 8, 2016 at 3:29 PM Sean Owen <so...@cloudera.com> wrote:
>>>>
>>>> I think the problem here is that IndexedRowMatrix.toRowMatrix does
>>>> *not* result in a RowMatrix with rows in order of their indices,
>>>> necessarily:
>>>>
>>>>
>>>> // Drop its row indices.
>>>> RowMatrix rowMat = indexedRowMatrix.toRowMatrix();
>>>>
>>>> What you get is a matrix where the rows are arranged in whatever order
>>>> they were passed to IndexedRowMatrix. RowMatrix says it's for rows where
>>>> the ordering doesn't matter, but then it's maybe surprising it has a QR
>>>> decomposition method, because clearly the result depends on the order of
>>>> rows in the input. (CC Yuhao Yang for a comment?)
>>>>
>>>> You could say, well, why doesn't IndexedRowMatrix.toRowMatrix return at
>>>> least something with sorted rows? that would not be hard. It also won't
>>>> return "missing" rows (all zeroes), so it would not in any event result in
>>>> a RowMatrix whose implicit rows and ordering represented the same matrix.
>>>> That, at least, strikes me as something to be better documented.
>>>>
>>>> Maybe it would be nicer still to at least sort the rows, given the
>>>> existence of use cases like yours. For example, at least
>>>> CoordinateMatrix.toIndexedRowMatrix could sort? that is less
>>>> surprising.
>>>>
>>>> In any event you should be able to make it work by manually getting the
>>>> RDD[IndexedRow] out of IndexedRowMatrix, sorting by index, then mapping it
>>>> to Vectors and making a RowMatrix from it.
>>>>
>>>>
>>>>
>>>> On Tue, Nov 8, 2016 at 2:41 PM Iman Mohtashemi <
>>>> iman.mohtash...@gmail.com> wrote:
>>>>
>>>> Hi Sean,
>>>> Here you go:
>>>>
>>>> sparsematrix.txt =
>>>>
>>>> row, col ,val
>>>> 0,0,.42
>>>> 0,1,.28
>>>> 0,2,.89
>>>> 1,0,.83
>>>> 1,1,.34
>>>> 1,2,.42
>>>> 2,0,.23
>>>> 3,0,.42
>>>> 3,1,.98
>>>> 3,2,.88
>>>> 4,0,.23
>>>> 4,1,.36
>>>> 4,2,.97
>>>>
>>>> The vector is just the third column of the matrix which should give the
>>>> trivial solution of [0,0,1]
>>>>
>>>> This translates to this which is correct
>>>> There are zeros in the matrix (Not really sparse but just an example)
>>>> 0.42  0.28  0.89
>>>> 0.83  0.34  0.42
>>>> 0.23  0.0   0.0
>>>> 0.42  0.98  0.88
>>>> 0.23  0.36  0.97
>>>>
>>>>
>>>> Here is what I get for  the Q and R
>>>>
>>>> Q: -0.21470961288429483  0.23590615093828807   0.6784910613691661
>>>> -0.3920784235278427   -0.06171221388256143  0.5847874866876442
>>>> -0.7748216464954987   -0.4003560542230838   -0.29392323671555354
>>>> -0.3920784235278427   0.8517909521421976    -0.31435038559403217
>>>> -0.21470961288429483  -0.23389547730301666  -0.11165321782745863
>>>> R: -1.0712142642814275  -0.8347536340918976  -1.227672225670157
>>>> 0.0                  0.7662808691141717   0.7553315911660984
>>>> 0.0                  0.0                  0.7785210939368136
>>>>
>>>> When running this in matlab the numbers are the same but row 1 is the
>>>> last row and the last row is interchanged with row 3
>>>>
>>>>
>>>>
>>>> On Mon, Nov 7, 2016 at 11:35 PM Sean Owen <so...@cloudera.com> wrote:
>>>>
>>>> Rather than post a large section of code, please post a small example
>>>> of the input matrix and its decomposition, to illustrate what you're saying
>>>> is out of order.
>>>>
>>>> On Tue, Nov 8, 2016 at 3:50 AM im281 <iman.mohtash...@gmail.com> wrote:
>>>>
>>>> I am getting the correct rows but they are out of order. Is this a bug
>>>> or am
>>>> I doing something wrong?
>>>>
>>>>
>>>>
>>>>
>>

Reply via email to