Re: [theano-users] Re: Cholesky decomposition slow

Wong Hang Fri, 07 Feb 2020 05:49:28 -0800

Hi all,

I found that the cholesky factorization unit test no longer works.
The value returned are completely wrong. It looks like a memory error.
I checked if I skip tril call, the value returned by cuSOLVER is correct.
There should be something wrong in libgpuarray


======================================================================
ERROR: test_dense_chol_lower
(theano.gpuarray.tests.test_linalg.TestGpuCholesky64)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/wonghang/github/Theano/theano/gpuarray/tests/test_linalg.py",
line 327, in test_dense_chol_lower
    self.compare_gpu_cholesky_to_np(A_val, lower=lower, inplace=inplace)
  File "/home/wonghang/github/Theano/theano/gpuarray/tests/test_linalg.py",
line 280, in compare_gpu_cholesky_to_np
    utt.assert_allclose(chol_A_res, chol_A_val)
  File "/home/wonghang/github/Theano/theano/tests/unittest_tools.py", line
358, in assert_allclose
    raise WrongValue(expected, value, rtol, atol)
theano.tests.unittest_tools.WrongValue: WrongValue
           : shape, dtype, strides, min, max, n_inf, n_nan:
  Expected : (3, 3) float64 (24, 8) 1.078578362e-314 1.0548793676823098 0 0
  Value    : (3, 3) float64 (24, 8) 0.0 1.5121774155893968 0 0
  expected    : [[2.00683310e-314 3.46328020e-001 1.07857836e-314]
 [2.29026158e-001 1.05487937e+000 4.86725043e-001]
 [2.07913268e-001 4.16263205e-001 1.04157477e+000]]
  value    : [[1.51217742 0.         0.        ]
 [0.22902616 1.05487937 0.        ]
 [0.20791327 0.41626321 1.04157477]]
  Max Abs Diff:  1.5121774155893968
  Mean Abs Diff:  0.2605811643516005
  Median Abs Diff:  1.078578362e-314
  Std Abs Diff:  0.4752077922970366
  Max Rel Diff:  inf
  Mean Rel Diff:  inf
  Median Rel Diff:  1.3335589252099037e-16
  Std Rel Diff:  nan

  rtol, atol: 1e-05 1e-08


======================================================================
ERROR: test_diag_chol (theano.gpuarray.tests.test_linalg.TestGpuCholesky64)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/wonghang/github/Theano/theano/gpuarray/tests/test_linalg.py",
line 317, in test_diag_chol
    self.compare_gpu_cholesky_to_np(A_val, lower=lower, inplace=inplace)
  File "/home/wonghang/github/Theano/theano/gpuarray/tests/test_linalg.py",
line 280, in compare_gpu_cholesky_to_np
    utt.assert_allclose(chol_A_res, chol_A_val)
  File "/home/wonghang/github/Theano/theano/tests/unittest_tools.py", line
358, in assert_allclose
    raise WrongValue(expected, value, rtol, atol)
theano.tests.unittest_tools.WrongValue: WrongValue
           : shape, dtype, strides, min, max, n_inf, n_nan:
  Expected : (5, 5) float64 (40, 8) 0.0 1.3969459393428005 0 0
  Value    : (5, 5) float64 (40, 8) 0.0 1.3969459393428005 0 0
  expected    : [[1.26525335e-314 0.00000000e+000 0.00000000e+000
0.00000000e+000
  0.00000000e+000]
 [0.00000000e+000 2.01543086e-314 0.00000000e+000 0.00000000e+000
  0.00000000e+000]
 [0.00000000e+000 0.00000000e+000 1.29480282e+000 0.00000000e+000
  0.00000000e+000]
 [0.00000000e+000 0.00000000e+000 0.00000000e+000 1.31448015e+000
  0.00000000e+000]
 [0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
  1.39694594e+000]]
  value    : [[1.3040081  0.         0.         0.         0.        ]
 [0.         1.35800834 0.         0.         0.        ]
 [0.         0.         1.29480282 0.         0.        ]
 [0.         0.         0.         1.31448015 0.        ]
 [0.         0.         0.         0.         1.39694594]]
  Max Abs Diff:  1.3580083368118308
  Mean Abs Diff:  0.106480657426342
  Median Abs Diff:  0.0
  Std Abs Diff:  0.361174224138967
  Max Rel Diff:  nan
  Mean Rel Diff:  nan
  Median Rel Diff:  nan
  Std Rel Diff:  nan

  rtol, atol: 1e-05 1e-08


----------------------------------------------------------------------
Ran 40 tests in 12.218s

FAILED (errors=2, skipped=16)

Please use the revision 07cd4ad56054c279442ee28413b26939f4c03632 of
libgpuarray

Use the following command to install an old version of libgpuarray:

$ git clone https://github.com/Theano/libgpuarray.git
$ cd libgpuarray
$ git checkout 07cd4ad56054c279442ee28413b26939f4c03632 .
$ mkdir cmake
$ cd cmake
$ cmake ..
$ make
$ sudo make install
$ sudo ldconfig
$ cd ..
$ python3 setup.py install

and then run your theano code again. I think it would work now.
I will check the code in libgpuarray later. Let me raise an issue first.

Best,
wonghang

Paul Baggenstoss <p.m.baggenst...@ieee.org> 於 2020年2月7日 週五 下午9:49寫道：

> Hi wonghang,  Sorry to pester you with emails, but I have some interesting
> timing information.
> I ran a process using different processors and ways of computing Cholesky()
> The results are surprising.
>
> GpuMagmaCholesky()                9.0 sec
> slinalg.Cholesky(uses cusolver)  2.9 sec
> CPU                                         1.9 sec
>
> It looks like it pays to just use the CPU!
>
> Doesn't make any sense!
> Paul
>
>
> On Thursday, February 6, 2020 at 2:53:55 PM UTC+1, Paul Baggenstoss wrote:
>>
>>
>> Hello again.
>>      So I added 64-bit support to theano/gpuarray/c_code/magma_cholesky.c
>> and to theano/gpuarray/linalg.py in the function GpuMagmaCholesky(). I
>> attached the files.
>> It works now for 32 and 64 bit and has gradient. The numerical problem is
>> gone.
>>   But (and this is a big BUT) it iseems to be a factor of at least 2
>> times slower than the CPU. Any thoughts on this?
>> Paul
>>
>>
>> On Thursday, February 6, 2020 at 10:28:08 AM UTC+1, Paul Baggenstoss
>> wrote:
>>>
>>> Simon,
>>> I did more digging and have some more information. I tested
>>> theano.gpuarray.linalg.GpuMagmaCholesky(),  on float32 and it looks good.
>>> The result is exactly the same as for CPU.
>>> So the problem seems to be in CUsolver.  The problem is that
>>> theano.gpuarray.linalg.GpuMagmaCholesky()(Cll) does not define a gradient
>>> and works only for
>>> float32. I installed the latest magma-2.5.2 and it has support for
>>> double precision Cholesky (dpotrf) but Theano seems to use it's own copy of
>>> the MAGMA source.
>>> Not sure how that works. Can I force Theano to use magma-2.5.2 ?  If
>>> not, it seems feasible to borrow the gradient from
>>> theano.gpuarray.linalg.GpuCholesky()
>>> and add support for float64 as well.  Thoughts?
>>> Paul
>>>
>>>
>>> On Wednesday, February 5, 2020 at 5:32:43 PM UTC+1, Paul Baggenstoss
>>> wrote:
>>>>
>>>> Hi Simon, I forgot to mention that I use the gradient of Cholesky, and
>>>> this has even more error than the Cholesky decomo, but I assume that this
>>>> is because
>>>> of a bug in Cholesky itself.
>>>> Paul
>>>>
>>>>
>>>> On Wednesday, February 5, 2020 at 5:30:10 PM UTC+1, Paul Baggenstoss
>>>> wrote:
>>>>>
>>>>> Hi Simon,I have uploaded the MATLAB format file with the matrix Cll,
>>>>> which is the original matrix, and R_cpu which was produced using CPU by
>>>>> slinalg.Cholesky( ), and R_cuda which
>>>>> was produced by the same function, but with GPU ( I think it uses
>>>>> theano.gpuarray.linalg.GpuCholesky() )   Both used the same precision
>>>>> (float32)  so should give the same results.
>>>>> But you can see that at the end of the diagonal, the values go wild.
>>>>> It appears to be numericla errors.
>>>>> Thanks in advance!
>>>>> Paul
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wednesday, February 5, 2020 at 4:56:14 PM UTC+1, Wong Hang wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> The GPU cholesky decomposition relies on cuSOLVER or Magma. I believe
>>>>>> nvidia knows their hardware well and cuSOLVER should provide the best
>>>>>> efficient result.
>>>>>>
>>>>>> Although cholesky decomposition is very numerical stable, when I
>>>>>> write the test case, I find that I will get trouble for relatively small
>>>>>> matrix if I use single-precision.
>>>>>>
>>>>>> Are you using single-precision on a big matrix?
>>>>>> If not, try to compute the condition number of the matrix to see if
>>>>>> it is too big.
>>>>>>
>>>>>> If it is not too big, then it may be a bug. I also need to use the
>>>>>> cholesky operator, Please send me the matrix and I am willing to fix it.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> 2020年2月6日(木) 0:34 Paul Baggenstoss <p.m.ba...@ieee.org>:
>>>>>>
>>>>>>> HI Simon, I was wondering if you got anywhere with the faster
>>>>>>> Cholesky for Theano. I also use it a lot and have found it to be 
>>>>>>> unstable
>>>>>>> on the GPU.
>>>>>>> Paul
>>>>>>>
>>>>>>> On Saturday, March 7, 2015 at 11:45:36 AM UTC+1, Simon Ebner wrote:
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I want to do computations where I rely heavily on the Cholesky
>>>>>>>> decomposition. Writing a small benchmark for tensor.slinalg.Cholesky, I
>>>>>>>> noticed that the implementation is not as fast as I hoped. As far as I 
>>>>>>>> can
>>>>>>>> tell it is not optimized for GPUs yet but relies on the scipy
>>>>>>>> implementation?
>>>>>>>> Doing a bit of a google seach I found several cuda implementations
>>>>>>>> for fast Cholesky decompositions on the GPU. Before I try to include 
>>>>>>>> that
>>>>>>>> code into my theano environment, could you let me know whether you 
>>>>>>>> decided
>>>>>>>> not to implement fast Cholesky decomposition on the GPU on purpose?
>>>>>>>> Furthermore, since I'm fairly new to theano I'm not completely 
>>>>>>>> confident
>>>>>>>> how to incorporate cuda code best into my existing theano code. Is the
>>>>>>>> sensible to create a custom OP with optimized C-Code?
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Simon
>>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> ---
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "theano-users" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to theano...@googlegroups.com.
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/theano-users/aca41c35-ec36-4055-bac7-e53aced30ea7%40googlegroups.com
>>>>>>> <https://groups.google.com/d/msgid/theano-users/aca41c35-ec36-4055-bac7-e53aced30ea7%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to theano-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/theano-users/7aac6c1b-4b3b-4ad3-9a1d-1f331e28cf02%40googlegroups.com
> <https://groups.google.com/d/msgid/theano-users/7aac6c1b-4b3b-4ad3-9a1d-1f331e28cf02%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to theano-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/theano-users/CAAMb3nVGL_Si%2BLcBS8zqRzpd2JO68zEsLr7k8XdjySPEGD8dBw%40mail.gmail.com.

Re: [theano-users] Re: Cholesky decomposition slow

Reply via email to