On Mon, 13 Jun 2022 at 00:51, Pure Virtual <1min...@gmail.com> wrote:

> Hi there!


Hi!


> I've got a really simple task of finding the rank of a matrix of
> *integers*. NumPy's matrix_rank() appeared to use floats internally (and
> gave a clearly wrong answers when the matrix contained both small and huge
> coefficients like 1 and 10**16), so I decided to try SymPy instead.
> Unfortunately, I found it pretty much impossible to find the rank of a
> dense matrix of size 30×30 (and above), since it took forever. I timed it
> and found out that runtime grows *exponentially* (which is really
> mysterious for me).
>
> Am I doing something wrong? Is there a better way?
>

You're not doing anything wrong. This is just something that can (and will)
be improved in SymPy.

However it is important to understand that exact arbitrary precision
numerics do not necessarily have the same computational complexity as
numerics over fixed precision types. In this case you see that NumPy fails
because it uses fixed precision floating point and your question about rank
is really a question that needs exact numerics to be answered robustly.

Computing the rank of a matrix is done essentially through something like
Gaussian elimination (GE) but there are many different types of Gaussian
elimination. In the case of a matrix of integers we need to decide how to
handle division because the basic form of GE uses exact division but with
integers that leads to rational numbers. There are several approaches:

1. Use rational numbers (i.e. work over a field).
2. Use division free GE
3. Use fraction free GE (still use divisions but only carefully chosen
exact integer divisions).

Using division free GE leads to exponential costs so that the time taken
for an NxN matrix has worst case complexity bounds like O(2^N). This is
because although the *number* of arithmetic operations is bounded like
O(N^3)) the *cost* of those operations depends on the size of those
integers and they get big really fast during division free GE.

Another approach is to use fraction-free methods and these can bring the
bound down from O(2^N) to something like O(N^5) or potentially better if
combined with other techniques like fast matrix multiplication. Still
though, O(N^5) is a lot worse than the O(N^3) that you might have expected
if you were thinking about how this works with fixed precision types.

If you want to work with something more complicated than integers or
rationals then the complexity bounds can be much worse!

Here's a trivial example of what I do.
>
> PS C:\> python -m timeit -s "import sympy" "sympy.*randMatrix(10).rank()*"
> 20 loops, best of 5: *13 msec* per loop
> PS C:\> python -m timeit -s "import sympy" "sympy.*randMatrix(15).rank()*"
> 5 loops, best of 5: *47.7 msec* per loop
> PS C:\> python -m timeit -s "import sympy" "sympy.*randMatrix(20).rank()*"
> 1 loop, best of 5: *2.26 sec* per loop
>

SymPy has a newer faster implementation of matrices that works for matrices
with simple expression elements. In this case your elements are all
integers so we can use it with the ZZ domain:
https://docs.sympy.org/latest/modules/polys/domainmatrix.html
https://docs.sympy.org/latest/modules/polys/domainsintro.html

This faster implementation is already present internally in any Matrix but
is only used by default for certain limited operations and only when the
matrix has only integer or rational elements:

In [*1*]: Matrix([[1, 2], [3, 4]])._rep
Out[*1*]: DomainMatrix({0: {0: 1, 1: 2}, 1: {0: 3, 1: 4}}, (2, 2), ZZ)

Note that the _rep attribute should be considered private but it gives a
more efficient representation for a matrix like this. You can use it to
compute the rank like this:

In [*6*]: M = Matrix([[1, 2, 3], [4, 5, 6], [7, 8, 9]])


In [*7*]: M.rank()

Out[*7*]: 2


In [*8*]: M._rep

Out[*8*]: DomainMatrix({0: {0: 1, 1: 2, 2: 3}, 1: {0: 4, 1: 5, 2: 6}, 2:
{0: 7, 1: 8, 2: 9}}, (3, 3), ZZ)


In [*9*]: rank = len(M._rep.to_field().rref()[-1])


In [*10*]: rank

Out[*10*]: 2


Here are some timings:


In [*16*]: *for* N *in* [1, 2, 5, 10, 20, 50, 100]:

    ...:     M = randMatrix(N)

    ...:     print('N =', N)

    ...:     %time len(M._rep.to_field().rref()[-1])

    ...:

N = 1

CPU times: user 143 µs, sys: 1e+03 ns, total: 144 µs

Wall time: 151 µs

N = 2

CPU times: user 204 µs, sys: 39 µs, total: 243 µs

Wall time: 207 µs

N = 5

CPU times: user 547 µs, sys: 38 µs, total: 585 µs

Wall time: 618 µs

N = 10

CPU times: user 2.77 ms, sys: 68 µs, total: 2.83 ms

Wall time: 3.02 ms

N = 20

CPU times: user 21.8 ms, sys: 136 µs, total: 21.9 ms

Wall time: 28.7 ms

N = 50

CPU times: user 545 ms, sys: 2.48 ms, total: 547 ms

Wall time: 550 ms

N = 100

CPU times: user 7.37 s, sys: 58.9 ms, total: 7.43 s

Wall time: 7.51 s


That scales a lot better than the current Matrix rank method (I gave up at
N=20 here):


In [*15*]: *for* N *in* [1, 2, 5, 10, 20, 50, 100]:

    ...:     M = randMatrix(N)

    ...:     print('N =', N)

    ...:     %time M.rank()

    ...:

N = 1

CPU times: user 270 µs, sys: 0 ns, total: 270 µs

Wall time: 277 µs

N = 2

CPU times: user 1.12 ms, sys: 724 µs, total: 1.85 ms

Wall time: 1.18 ms

N = 5

CPU times: user 8.01 ms, sys: 96 µs, total: 8.1 ms

Wall time: 8.2 ms

N = 10

CPU times: user 23.3 ms, sys: 110 µs, total: 23.4 ms

Wall time: 23.5 ms

N = 20

CPU times: user 5.56 s, sys: 46.5 ms, total: 5.61 s

Wall time: 5.66 s

N = 50

^C
---------------------------------------------------------------------------

KeyboardInterrupt



This can all be made more efficient and the plan is to do so in future
SymPy versions. For now you can use DomainMatrix directly if these examples
are representative of what you want to do. It's also faster to use flint
through the python_flint library. I plan to make it possible to use flint
internally to speed SymPy up but that hasn't happened yet:


In [*1*]: *import* *flint*


In [*2*]: M = randMatrix(100)


In [*3*]: fM = flint.fmpz_mat([[int(i) *for* i *in* row] *for* row *in*
M.tolist()])


In [*4*]: %time fM.rank()

CPU times: user 3.96 ms, sys: 3.48 ms, total: 7.44 ms

Wall time: 18 ms

Out[*4*]: 100


In [*5*]: M = randMatrix(1000)


In [*6*]: fM = flint.fmpz_mat([[int(i) *for* i *in* row] *for* row *in*
M.tolist()])


In [*7*]: %time fM.rank()

CPU times: user 632 ms, sys: 19.9 ms, total: 652 ms

Wall time: 655 ms

Out[*7*]: 1000


--

Oscar

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sympy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sympy/CAHVvXxQ1eMWz-6AUHy0YLWpPnDRk5smZEkhLQvx7RafhYTZn3w%40mail.gmail.com.

Reply via email to