On Mon, 13 Jun 2022 at 00:51, Pure Virtual <1min...@gmail.com> wrote:
> Hi there! Hi! > I've got a really simple task of finding the rank of a matrix of > *integers*. NumPy's matrix_rank() appeared to use floats internally (and > gave a clearly wrong answers when the matrix contained both small and huge > coefficients like 1 and 10**16), so I decided to try SymPy instead. > Unfortunately, I found it pretty much impossible to find the rank of a > dense matrix of size 30×30 (and above), since it took forever. I timed it > and found out that runtime grows *exponentially* (which is really > mysterious for me). > > Am I doing something wrong? Is there a better way? > You're not doing anything wrong. This is just something that can (and will) be improved in SymPy. However it is important to understand that exact arbitrary precision numerics do not necessarily have the same computational complexity as numerics over fixed precision types. In this case you see that NumPy fails because it uses fixed precision floating point and your question about rank is really a question that needs exact numerics to be answered robustly. Computing the rank of a matrix is done essentially through something like Gaussian elimination (GE) but there are many different types of Gaussian elimination. In the case of a matrix of integers we need to decide how to handle division because the basic form of GE uses exact division but with integers that leads to rational numbers. There are several approaches: 1. Use rational numbers (i.e. work over a field). 2. Use division free GE 3. Use fraction free GE (still use divisions but only carefully chosen exact integer divisions). Using division free GE leads to exponential costs so that the time taken for an NxN matrix has worst case complexity bounds like O(2^N). This is because although the *number* of arithmetic operations is bounded like O(N^3)) the *cost* of those operations depends on the size of those integers and they get big really fast during division free GE. Another approach is to use fraction-free methods and these can bring the bound down from O(2^N) to something like O(N^5) or potentially better if combined with other techniques like fast matrix multiplication. Still though, O(N^5) is a lot worse than the O(N^3) that you might have expected if you were thinking about how this works with fixed precision types. If you want to work with something more complicated than integers or rationals then the complexity bounds can be much worse! Here's a trivial example of what I do. > > PS C:\> python -m timeit -s "import sympy" "sympy.*randMatrix(10).rank()*" > 20 loops, best of 5: *13 msec* per loop > PS C:\> python -m timeit -s "import sympy" "sympy.*randMatrix(15).rank()*" > 5 loops, best of 5: *47.7 msec* per loop > PS C:\> python -m timeit -s "import sympy" "sympy.*randMatrix(20).rank()*" > 1 loop, best of 5: *2.26 sec* per loop > SymPy has a newer faster implementation of matrices that works for matrices with simple expression elements. In this case your elements are all integers so we can use it with the ZZ domain: https://docs.sympy.org/latest/modules/polys/domainmatrix.html https://docs.sympy.org/latest/modules/polys/domainsintro.html This faster implementation is already present internally in any Matrix but is only used by default for certain limited operations and only when the matrix has only integer or rational elements: In [*1*]: Matrix([[1, 2], [3, 4]])._rep Out[*1*]: DomainMatrix({0: {0: 1, 1: 2}, 1: {0: 3, 1: 4}}, (2, 2), ZZ) Note that the _rep attribute should be considered private but it gives a more efficient representation for a matrix like this. You can use it to compute the rank like this: In [*6*]: M = Matrix([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) In [*7*]: M.rank() Out[*7*]: 2 In [*8*]: M._rep Out[*8*]: DomainMatrix({0: {0: 1, 1: 2, 2: 3}, 1: {0: 4, 1: 5, 2: 6}, 2: {0: 7, 1: 8, 2: 9}}, (3, 3), ZZ) In [*9*]: rank = len(M._rep.to_field().rref()[-1]) In [*10*]: rank Out[*10*]: 2 Here are some timings: In [*16*]: *for* N *in* [1, 2, 5, 10, 20, 50, 100]: ...: M = randMatrix(N) ...: print('N =', N) ...: %time len(M._rep.to_field().rref()[-1]) ...: N = 1 CPU times: user 143 µs, sys: 1e+03 ns, total: 144 µs Wall time: 151 µs N = 2 CPU times: user 204 µs, sys: 39 µs, total: 243 µs Wall time: 207 µs N = 5 CPU times: user 547 µs, sys: 38 µs, total: 585 µs Wall time: 618 µs N = 10 CPU times: user 2.77 ms, sys: 68 µs, total: 2.83 ms Wall time: 3.02 ms N = 20 CPU times: user 21.8 ms, sys: 136 µs, total: 21.9 ms Wall time: 28.7 ms N = 50 CPU times: user 545 ms, sys: 2.48 ms, total: 547 ms Wall time: 550 ms N = 100 CPU times: user 7.37 s, sys: 58.9 ms, total: 7.43 s Wall time: 7.51 s That scales a lot better than the current Matrix rank method (I gave up at N=20 here): In [*15*]: *for* N *in* [1, 2, 5, 10, 20, 50, 100]: ...: M = randMatrix(N) ...: print('N =', N) ...: %time M.rank() ...: N = 1 CPU times: user 270 µs, sys: 0 ns, total: 270 µs Wall time: 277 µs N = 2 CPU times: user 1.12 ms, sys: 724 µs, total: 1.85 ms Wall time: 1.18 ms N = 5 CPU times: user 8.01 ms, sys: 96 µs, total: 8.1 ms Wall time: 8.2 ms N = 10 CPU times: user 23.3 ms, sys: 110 µs, total: 23.4 ms Wall time: 23.5 ms N = 20 CPU times: user 5.56 s, sys: 46.5 ms, total: 5.61 s Wall time: 5.66 s N = 50 ^C --------------------------------------------------------------------------- KeyboardInterrupt This can all be made more efficient and the plan is to do so in future SymPy versions. For now you can use DomainMatrix directly if these examples are representative of what you want to do. It's also faster to use flint through the python_flint library. I plan to make it possible to use flint internally to speed SymPy up but that hasn't happened yet: In [*1*]: *import* *flint* In [*2*]: M = randMatrix(100) In [*3*]: fM = flint.fmpz_mat([[int(i) *for* i *in* row] *for* row *in* M.tolist()]) In [*4*]: %time fM.rank() CPU times: user 3.96 ms, sys: 3.48 ms, total: 7.44 ms Wall time: 18 ms Out[*4*]: 100 In [*5*]: M = randMatrix(1000) In [*6*]: fM = flint.fmpz_mat([[int(i) *for* i *in* row] *for* row *in* M.tolist()]) In [*7*]: %time fM.rank() CPU times: user 632 ms, sys: 19.9 ms, total: 652 ms Wall time: 655 ms Out[*7*]: 1000 -- Oscar -- You received this message because you are subscribed to the Google Groups "sympy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sympy+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAHVvXxQ1eMWz-6AUHy0YLWpPnDRk5smZEkhLQvx7RafhYTZn3w%40mail.gmail.com.