Re: [sympy] Lambdify - performance using sympy as the module

Aaron Meurer Thu, 16 Sep 2021 15:10:58 -0700

I suppose you might be using f = lambdify(x, sin(x), 'scipy'). This
produces a function that calls scipy.sin instead of numpy.sin. However,
scipy.sin is just a deprecated wrapper for np.sin, which would explain why
it is a little slower.


I'm a little unclear why it doesn't raise a deprecation warning when you
call it in the lambdified function, but you can confirm that it is indeed
using scipy.sin by looking at f.__globals__['sin']. In any case, you should
generally not just use 'scipy' by itself, but rather ['numpy', 'scipy'].
Actually lambdify already includes scipy by default, so there's no need to
specify either. See https://github.com/sympy/sympy/issues/20294.

Aaron Meurer

On Thu, Sep 16, 2021 at 3:58 PM Aaron Meurer <asmeu...@gmail.com> wrote:

> What is the difference between "numpy" and "scipy" in your graph? I
> wouldn't expect scipy to make any difference for sin or cos which are NumPy
> functions.
>
> Aaron Meurer
>
> On Wed, Sep 15, 2021 at 7:19 AM Davide Sandona' <sandona.dav...@gmail.com>
> wrote:
>
>> Thanks Oscar for the wonderful clarification!
>>
>> I rerun my code with SYMPY_USE_CACHE=no, now the results make much more
>> sense.
>>
>> [image: lambdify-no-cache.png]
>>
>> Davide.
>>
>>
>> Il giorno mer 15 set 2021 alle ore 15:14 Oscar Benjamin <
>> oscar.j.benja...@gmail.com> ha scritto:
>>
>>>
>>> On Wed, 15 Sept 2021 at 13:41, Oscar Benjamin <
>>> oscar.j.benja...@gmail.com> wrote:
>>>
>>>> On Tue, 14 Sept 2021 at 23:12, sandona...@gmail.com <
>>>> sandona.dav...@gmail.com> wrote:
>>>>
>>>>> Hello,
>>>>> let's say I'd like to numerically evaluate a single sympy function
>>>>> over an array using sympy as the module. Curiously, passing in regular
>>>>> Python's float numbers makes the evaluation much faster then passing in
>>>>> Sympy's Float instances. I tried several sympy functions, they tend to
>>>>> follow this trend.
>>>>>
>>>>
>>>> The 3 millisecond timing difference that you are asking about here is
>>>> dwarfed by the actual 1 second time that it really takes to compute this
>>>> result the first time. Most likely the time differences you see are just to
>>>> do with exactly how efficient the cache lookups are for different types.
>>>>
>>>
>>> If you want to see what is taking the time then use a profiler. From
>>> isympy/ipython you can use %prun or if you want to run separate processes
>>> you can use python -m cProfile but that will also time how long it takes to
>>> import sympy so it's only useful for things that take at least several
>>> seconds.
>>>
>>> In the first run the time is taken by sin.eval and most of that by the
>>> extract_multiplicatively function.
>>>
>>> In [2]: %prun -s cumulative f(domain)
>>>
>>>          2069604 function calls (2047615 primitive calls) in 1.585
>>> seconds
>>>
>>>    Ordered by: cumulative time
>>>
>>>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>>>         1    0.000    0.000    1.585    1.585 {built-in method
>>> builtins.exec}
>>>         1    0.001    0.001    1.585    1.585 <string>:1(<module>)
>>>      1000    0.002    0.000    1.584    0.002
>>> <lambdifygenerated-1>:1(_lambdifygenerated)
>>> 16989/1000    0.022    0.000    1.582    0.002 cache.py:69(wrapper)
>>>      1000    0.014    0.000    1.579    0.002 function.py:450(__new__)
>>>      1000    0.013    0.000    1.392    0.001 function.py:270(__new__)
>>>      1000    0.012    0.000    1.270    0.001 trigonometric.py:266(eval)
>>>      2997    0.029    0.000    0.988    0.000
>>> expr.py:2183(extract_multiplicatively)
>>>      7993    0.020    0.000    0.835    0.000 assumptions.py:460(getit)
>>>      8993    0.336    0.000    0.722    0.000
>>> facts.py:499(deduce_all_facts)
>>> 6993/2997    0.007    0.000    0.529    0.000
>>> decorators.py:88(__sympifyit_wrapper)
>>>      2997    0.010    0.000    0.525    0.000
>>> numbers.py:1315(__truediv__)
>>>       999    0.005    0.000    0.511    0.001
>>> expr.py:2453(could_extract_minus_sign)
>>>       999    0.001    0.000    0.505    0.001
>>> expr.py:1662(as_coefficient)
>>>       999    0.001    0.000    0.480    0.000 numbers.py:759(__truediv__)
>>>       999    0.001    0.000    0.478    0.000 decorators.py:254(_func)
>>>       999    0.001    0.000    0.476    0.000
>>> decorators.py:129(binary_op_wrapper)
>>>       999    0.003    0.000    0.474    0.000 expr.py:260(__truediv__)
>>>      4996    0.015    0.000    0.459    0.000 assumptions.py:472(_ask)
>>>       999    0.015    0.000    0.457    0.000 operations.py:46(__new__)
>>>       999    0.037    0.000    0.408    0.000 mul.py:178(flatten)
>>>      3997    0.005    0.000    0.357    0.000 assumptions.py:444(copy)
>>>      3997    0.014    0.000    0.352    0.000
>>> assumptions.py:432(__init__)
>>>    198823    0.074    0.000    0.254    0.000 {built-in method
>>> builtins.all}
>>>    570492    0.158    0.000    0.209    0.000 facts.py:533(<genexpr>)
>>>      1999    0.032    0.000    0.126    0.000 sets.py:1774(__new__)
>>>    183867    0.083    0.000    0.083    0.000 facts.py:482(_tell)
>>>       999    0.007    0.000    0.078    0.000 evalf.py:1425(evalf)
>>>     15987    0.023    0.000    0.077    0.000 sympify.py:92(sympify)
>>>      1000    0.003    0.000    0.066    0.000 function.py:214(nargs)
>>>      7996    0.026    0.000    0.065    0.000
>>> compatibility.py:501(ordered)
>>>      6994    0.015    0.000    0.063    0.000 numbers.py:1197(_new)
>>>     29977    0.020    0.000    0.057    0.000 <frozen
>>> importlib._bootstrap>:1009(_handle_fromlist)
>>>    391655    0.054    0.000    0.054    0.000 {method 'get' of 'dict'
>>> objects}
>>>  1998/999    0.007    0.000    0.053    0.000 evalf.py:1332(evalf)
>>>      1998    0.053    0.000    0.053    0.000 mul.py:449(_gather)
>>>       999    0.005    0.000    0.046    0.000 evalf.py:781(evalf_trig)
>>>    ...
>>>
>>> Now a second run in the same process just shows 1000 cache lookups:
>>>
>>> In [3]: %prun -s cumulative f(domain)
>>>
>>>          2003 function calls in 0.002 seconds
>>>
>>>    Ordered by: cumulative time
>>>
>>>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>>>         1    0.000    0.000    0.002    0.002 {built-in method
>>> builtins.exec}
>>>         1    0.001    0.001    0.002    0.002 <string>:1(<module>)
>>>      1000    0.001    0.000    0.001    0.000
>>> <lambdifygenerated-1>:1(_lambdifygenerated)
>>>      1000    0.001    0.000    0.001    0.000 cache.py:69(wrapper)
>>>         1    0.000    0.000    0.000    0.000 {method 'disable' of
>>> '_lsprof.Profiler' objects}
>>>
>>> This is the second run when the inputs are Expr:
>>>
>>> In [5]: %prun -s cumulative f(domain_sympy)
>>>          5003 function calls in 0.014 seconds
>>>
>>>    Ordered by: cumulative time
>>>
>>>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>>>         1    0.000    0.000    0.014    0.014 {built-in method
>>> builtins.exec}
>>>         1    0.007    0.007    0.014    0.014 <string>:1(<module>)
>>>      1000    0.002    0.000    0.008    0.000
>>> <lambdifygenerated-1>:1(_lambdifygenerated)
>>>      1000    0.002    0.000    0.006    0.000 cache.py:69(wrapper)
>>>      1000    0.001    0.000    0.004    0.000 numbers.py:1480(__hash__)
>>>      1000    0.001    0.000    0.002    0.000 numbers.py:806(__hash__)
>>>      1000    0.001    0.000    0.001    0.000 expr.py:126(__hash__)
>>>         1    0.000    0.000    0.000    0.000 {method 'disable' of
>>> '_lsprof.Profiler' objects}
>>>
>>> The cache lookups here are slower because the pure Python Expr.__hash__
>>> methods are slower than for numpy's float64.__hash__ which is implemented
>>> in C.
>>>
>>> Either way this shows that the differences are all just to do with
>>> caching and are not representative of the actual time that it would take to
>>> compute these results once. The bulk of the time in the first run is
>>> ultimately consumed by deduce_all_facts i.e. the old assumptions. Note that
>>> the old assumptions are themselves cached on each Basic instance separately
>>> from the main cache which can also affect timing results if you don't use a
>>> separate process for timings:
>>>
>>> In [1]: e = sin(1)
>>>
>>> In [2]: e._assumptions
>>> Out[2]: {}
>>>
>>> In [3]: e.is_real
>>> Out[3]: True
>>>
>>> In [4]: e._assumptions
>>> Out[4]:
>>> {'real': True,
>>>  'positive': True,
>>>  'finite': True,
>>>  'infinite': False,
>>>  'extended_positive': True,
>>>  'extended_real': True,
>>>  'commutative': True,
>>>  'imaginary': False,
>>>  'hermitian': True,
>>>  'complex': True,
>>>  'extended_nonnegative': True,
>>>  'nonpositive': False,
>>>  'negative': False,
>>>  'extended_nonpositive': False,
>>>  'extended_nonzero': True,
>>>  'extended_negative': False,
>>>  'zero': False,
>>>  'nonnegative': True,
>>>  'nonzero': True}
>>>
>>> The deduce_all_facts function is the one that applies all of the
>>> implications relating these different assumption predicates so that as soon
>>> as one predicate (e.g. positive=True) is known then all of the others can
>>> be stored in the _assumptions dict. Although this makes things faster on a
>>> second assumptions query it does in fact consume the bulk of the time for
>>> computing many things in SymPy.
>>>
>>> Assumptions queries that take place during automatic evaluation are
>>> often a cause of slowness in SymPy. The bottom of the profile report shows
>>> evalf being called 2000 times i.e. twice per evaluation of the sin
>>> function. The evalf calls are part of the assumptions query and are
>>> reasonably fast for a simple sin(float) but can be very expensive for large
>>> expressions.
>>>
>>>
>>> --
>>> Oscar
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "sympy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to sympy+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/sympy/CAHVvXxRmA9AYL3TnAJeM_crc%2BKc9kb3ADtLR%2BptZEQKYPbKspw%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/sympy/CAHVvXxRmA9AYL3TnAJeM_crc%2BKc9kb3ADtLR%2BptZEQKYPbKspw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "sympy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to sympy+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/sympy/CAO%3D1Z0_LqKn2syviLVpBVqOkZMCWC--5g_Z55yhndhs%3D5ZjX-A%40mail.gmail.com
>> <https://groups.google.com/d/msgid/sympy/CAO%3D1Z0_LqKn2syviLVpBVqOkZMCWC--5g_Z55yhndhs%3D5ZjX-A%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sympy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sympy/CAKgW%3D6L1ND5oJJFb6cKAGYNyuribHipAUWG0ejw0YLvGzvtBkA%40mail.gmail.com.

Re: [sympy] Lambdify - performance using sympy as the module

Reply via email to