Hi all, I found that the cholesky factorization unit test no longer works. The value returned are completely wrong. It looks like a memory error. I checked if I skip tril call, the value returned by cuSOLVER is correct. There should be something wrong in libgpuarray
====================================================================== ERROR: test_dense_chol_lower (theano.gpuarray.tests.test_linalg.TestGpuCholesky64) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/wonghang/github/Theano/theano/gpuarray/tests/test_linalg.py", line 327, in test_dense_chol_lower self.compare_gpu_cholesky_to_np(A_val, lower=lower, inplace=inplace) File "/home/wonghang/github/Theano/theano/gpuarray/tests/test_linalg.py", line 280, in compare_gpu_cholesky_to_np utt.assert_allclose(chol_A_res, chol_A_val) File "/home/wonghang/github/Theano/theano/tests/unittest_tools.py", line 358, in assert_allclose raise WrongValue(expected, value, rtol, atol) theano.tests.unittest_tools.WrongValue: WrongValue : shape, dtype, strides, min, max, n_inf, n_nan: Expected : (3, 3) float64 (24, 8) 1.078578362e-314 1.0548793676823098 0 0 Value : (3, 3) float64 (24, 8) 0.0 1.5121774155893968 0 0 expected : [[2.00683310e-314 3.46328020e-001 1.07857836e-314] [2.29026158e-001 1.05487937e+000 4.86725043e-001] [2.07913268e-001 4.16263205e-001 1.04157477e+000]] value : [[1.51217742 0. 0. ] [0.22902616 1.05487937 0. ] [0.20791327 0.41626321 1.04157477]] Max Abs Diff: 1.5121774155893968 Mean Abs Diff: 0.2605811643516005 Median Abs Diff: 1.078578362e-314 Std Abs Diff: 0.4752077922970366 Max Rel Diff: inf Mean Rel Diff: inf Median Rel Diff: 1.3335589252099037e-16 Std Rel Diff: nan rtol, atol: 1e-05 1e-08 ====================================================================== ERROR: test_diag_chol (theano.gpuarray.tests.test_linalg.TestGpuCholesky64) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/wonghang/github/Theano/theano/gpuarray/tests/test_linalg.py", line 317, in test_diag_chol self.compare_gpu_cholesky_to_np(A_val, lower=lower, inplace=inplace) File "/home/wonghang/github/Theano/theano/gpuarray/tests/test_linalg.py", line 280, in compare_gpu_cholesky_to_np utt.assert_allclose(chol_A_res, chol_A_val) File "/home/wonghang/github/Theano/theano/tests/unittest_tools.py", line 358, in assert_allclose raise WrongValue(expected, value, rtol, atol) theano.tests.unittest_tools.WrongValue: WrongValue : shape, dtype, strides, min, max, n_inf, n_nan: Expected : (5, 5) float64 (40, 8) 0.0 1.3969459393428005 0 0 Value : (5, 5) float64 (40, 8) 0.0 1.3969459393428005 0 0 expected : [[1.26525335e-314 0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000] [0.00000000e+000 2.01543086e-314 0.00000000e+000 0.00000000e+000 0.00000000e+000] [0.00000000e+000 0.00000000e+000 1.29480282e+000 0.00000000e+000 0.00000000e+000] [0.00000000e+000 0.00000000e+000 0.00000000e+000 1.31448015e+000 0.00000000e+000] [0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000 1.39694594e+000]] value : [[1.3040081 0. 0. 0. 0. ] [0. 1.35800834 0. 0. 0. ] [0. 0. 1.29480282 0. 0. ] [0. 0. 0. 1.31448015 0. ] [0. 0. 0. 0. 1.39694594]] Max Abs Diff: 1.3580083368118308 Mean Abs Diff: 0.106480657426342 Median Abs Diff: 0.0 Std Abs Diff: 0.361174224138967 Max Rel Diff: nan Mean Rel Diff: nan Median Rel Diff: nan Std Rel Diff: nan rtol, atol: 1e-05 1e-08 ---------------------------------------------------------------------- Ran 40 tests in 12.218s FAILED (errors=2, skipped=16) Please use the revision 07cd4ad56054c279442ee28413b26939f4c03632 of libgpuarray Use the following command to install an old version of libgpuarray: $ git clone https://github.com/Theano/libgpuarray.git $ cd libgpuarray $ git checkout 07cd4ad56054c279442ee28413b26939f4c03632 . $ mkdir cmake $ cd cmake $ cmake .. $ make $ sudo make install $ sudo ldconfig $ cd .. $ python3 setup.py install and then run your theano code again. I think it would work now. I will check the code in libgpuarray later. Let me raise an issue first. Best, wonghang Paul Baggenstoss <p.m.baggenst...@ieee.org> 於 2020年2月7日 週五 下午9:49寫道: > Hi wonghang, Sorry to pester you with emails, but I have some interesting > timing information. > I ran a process using different processors and ways of computing Cholesky() > The results are surprising. > > GpuMagmaCholesky() 9.0 sec > slinalg.Cholesky(uses cusolver) 2.9 sec > CPU 1.9 sec > > It looks like it pays to just use the CPU! > > Doesn't make any sense! > Paul > > > On Thursday, February 6, 2020 at 2:53:55 PM UTC+1, Paul Baggenstoss wrote: >> >> >> Hello again. >> So I added 64-bit support to theano/gpuarray/c_code/magma_cholesky.c >> and to theano/gpuarray/linalg.py in the function GpuMagmaCholesky(). I >> attached the files. >> It works now for 32 and 64 bit and has gradient. The numerical problem is >> gone. >> But (and this is a big BUT) it iseems to be a factor of at least 2 >> times slower than the CPU. Any thoughts on this? >> Paul >> >> >> On Thursday, February 6, 2020 at 10:28:08 AM UTC+1, Paul Baggenstoss >> wrote: >>> >>> Simon, >>> I did more digging and have some more information. I tested >>> theano.gpuarray.linalg.GpuMagmaCholesky(), on float32 and it looks good. >>> The result is exactly the same as for CPU. >>> So the problem seems to be in CUsolver. The problem is that >>> theano.gpuarray.linalg.GpuMagmaCholesky()(Cll) does not define a gradient >>> and works only for >>> float32. I installed the latest magma-2.5.2 and it has support for >>> double precision Cholesky (dpotrf) but Theano seems to use it's own copy of >>> the MAGMA source. >>> Not sure how that works. Can I force Theano to use magma-2.5.2 ? If >>> not, it seems feasible to borrow the gradient from >>> theano.gpuarray.linalg.GpuCholesky() >>> and add support for float64 as well. Thoughts? >>> Paul >>> >>> >>> On Wednesday, February 5, 2020 at 5:32:43 PM UTC+1, Paul Baggenstoss >>> wrote: >>>> >>>> Hi Simon, I forgot to mention that I use the gradient of Cholesky, and >>>> this has even more error than the Cholesky decomo, but I assume that this >>>> is because >>>> of a bug in Cholesky itself. >>>> Paul >>>> >>>> >>>> On Wednesday, February 5, 2020 at 5:30:10 PM UTC+1, Paul Baggenstoss >>>> wrote: >>>>> >>>>> Hi Simon,I have uploaded the MATLAB format file with the matrix Cll, >>>>> which is the original matrix, and R_cpu which was produced using CPU by >>>>> slinalg.Cholesky( ), and R_cuda which >>>>> was produced by the same function, but with GPU ( I think it uses >>>>> theano.gpuarray.linalg.GpuCholesky() ) Both used the same precision >>>>> (float32) so should give the same results. >>>>> But you can see that at the end of the diagonal, the values go wild. >>>>> It appears to be numericla errors. >>>>> Thanks in advance! >>>>> Paul >>>>> >>>>> >>>>> >>>>> >>>>> On Wednesday, February 5, 2020 at 4:56:14 PM UTC+1, Wong Hang wrote: >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> The GPU cholesky decomposition relies on cuSOLVER or Magma. I believe >>>>>> nvidia knows their hardware well and cuSOLVER should provide the best >>>>>> efficient result. >>>>>> >>>>>> Although cholesky decomposition is very numerical stable, when I >>>>>> write the test case, I find that I will get trouble for relatively small >>>>>> matrix if I use single-precision. >>>>>> >>>>>> Are you using single-precision on a big matrix? >>>>>> If not, try to compute the condition number of the matrix to see if >>>>>> it is too big. >>>>>> >>>>>> If it is not too big, then it may be a bug. I also need to use the >>>>>> cholesky operator, Please send me the matrix and I am willing to fix it. >>>>>> >>>>>> Best, >>>>>> >>>>>> 2020年2月6日(木) 0:34 Paul Baggenstoss <p.m.ba...@ieee.org>: >>>>>> >>>>>>> HI Simon, I was wondering if you got anywhere with the faster >>>>>>> Cholesky for Theano. I also use it a lot and have found it to be >>>>>>> unstable >>>>>>> on the GPU. >>>>>>> Paul >>>>>>> >>>>>>> On Saturday, March 7, 2015 at 11:45:36 AM UTC+1, Simon Ebner wrote: >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I want to do computations where I rely heavily on the Cholesky >>>>>>>> decomposition. Writing a small benchmark for tensor.slinalg.Cholesky, I >>>>>>>> noticed that the implementation is not as fast as I hoped. As far as I >>>>>>>> can >>>>>>>> tell it is not optimized for GPUs yet but relies on the scipy >>>>>>>> implementation? >>>>>>>> Doing a bit of a google seach I found several cuda implementations >>>>>>>> for fast Cholesky decompositions on the GPU. Before I try to include >>>>>>>> that >>>>>>>> code into my theano environment, could you let me know whether you >>>>>>>> decided >>>>>>>> not to implement fast Cholesky decomposition on the GPU on purpose? >>>>>>>> Furthermore, since I'm fairly new to theano I'm not completely >>>>>>>> confident >>>>>>>> how to incorporate cuda code best into my existing theano code. Is the >>>>>>>> sensible to create a custom OP with optimized C-Code? >>>>>>>> >>>>>>>> Best, >>>>>>>> Simon >>>>>>>> >>>>>>> -- >>>>>>> >>>>>>> --- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "theano-users" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to theano...@googlegroups.com. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/theano-users/aca41c35-ec36-4055-bac7-e53aced30ea7%40googlegroups.com >>>>>>> <https://groups.google.com/d/msgid/theano-users/aca41c35-ec36-4055-bac7-e53aced30ea7%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>> -- > > --- > You received this message because you are subscribed to the Google Groups > "theano-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to theano-users+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/theano-users/7aac6c1b-4b3b-4ad3-9a1d-1f331e28cf02%40googlegroups.com > <https://groups.google.com/d/msgid/theano-users/7aac6c1b-4b3b-4ad3-9a1d-1f331e28cf02%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/theano-users/CAAMb3nVGL_Si%2BLcBS8zqRzpd2JO68zEsLr7k8XdjySPEGD8dBw%40mail.gmail.com.