[sympy] Re: Identifying repeated subexpressions in systems of equations

Robert Kern Tue, 17 Jun 2008 10:46:08 -0700

On Tue, Jun 17, 2008 at 02:11, Ondrej Certik <[EMAIL PROTECTED]> wrote:
> On Mon, Jun 16, 2008 at 11:41 PM, Luke <[EMAIL PROTECTED]> wrote:
>>
>> Robert,
>>  The Mathematica link you provided is exactly what I'm trying to do.
>> I haven't tried your python code yet but after reading it I think it
>
> I did:
>
> http://code.google.com/p/sympy/issues/detail?id=891#c3
>
>> should work great.  I really appreciate your comments and your help!
>
> Thanks Robert and Fredrik for the code!
>
> Robert's main loop is this:
>
>    for subtree in postorder_traversal(expr):
>        if subtree.args == ():
>            # Exclude atoms, since there is no point in renaming them.
>            continue
>        if subtree.args != () and subtree in seen_subexp and subtree
> not in to_eliminate:
>            to_eliminate.append(subtree)
>        seen_subexp.add(subtree)
>
> It's awesome. Then one just takes the to_eliminate list and substitute
> it for x_0, x_1, ...
>
> On Mon, Jun 16, 2008 at 8:28 AM, Robert Kern <[EMAIL PROTECTED]> wrote:
>> On Sun, Jun 15, 2008 at 17:46, Ondrej Certik <[EMAIL PROTECTED]> wrote:
>>> Anyway, I created an issue for this:
>>>
>>> http://code.google.com/p/sympy/issues/detail?id=891
>>
>> I've attached some nominally working code to the issue. Playing with
>> it, I see some suboptimal results. First, it's quadratic, but I'm not
>> clever enough to think of a way around that. Second, sympy's default
>> representation of expressions isn't well-suited for this algorithm (or
>> the other way around). For example:
>>
>> In [1]: from cse import cse
>>
>> In [2]: cse((x-y)*(z-y) + sqrt((x-y)*(z-y)))
>> Out[2]:
>> ⎛                                                  ⎽⎽⎽⎽   ⎽⎽⎽⎽⎞
>> ⎝[(x₀, -y), (x₁, x + x₀), (x₂, x₀ + z)], x₁*x₂ + ╲╱ x₁ *╲╱ x₂ ⎠
>>
>>
>> We see that -y gets picked up as term that should be pulled out
>> although it only shows up in the context of (x-y). For the typical use
>> case, subtraction is an atomic operation, and the representation of
>> (x-y) as Add(Mul(NegativeOne(-1), Symbol('y')), Symbol('x')) gets in
>> the way.
>>
>> Although the input expression has both x1 and x2 under the same
>> sqrt(), it gets broken out before the cse() function gets to look at
>> it. It would be nice to stuff everything in that term under the same
>> sqrt() before cse() operates.
>
> Well, x-y is represented as Add(x, Mul(-1, y)) as you said. So  -y is
> Mul(-1,y), thus it is a common subexpression and thus it get
> substituted,
> I think everything is a-ok, isn't it?


It is certainly correct for that representation. However, my claim is
that this representation is suboptimal for the purposes to which one
wants to apply common subexpression elimination. All of the use cases
hold as important one of these two criteria:

  1. flop count
  2. readability

CSE on the default sympy representation pessimizes both of these. For
floating point evaluation, - is an atomic operation. You would not
generate code that breaks that up into a Mul and an Add.

  x0 = -1*y
  x1 = x + x0

versus

  x1 = x - y

Readability is much the same.

> So if you want to improve the algorithm, apply this trivial patch:
>
> diff --git a/cse.py b/cse.py
> --- a/cse.py
> +++ b/cse.py
> @@ -48,6 +48,9 @@ def cse(expr, prefix='x'):
>         if subtree.args == ():
>             # Exclude atoms, since there is no point in renaming them.
>             continue
> +        if subtree.is_Mul and subtree.args[0].is_number:
> +            # Exclude stuff like (-y)
> +            continue
>         if subtree.args != () and subtree in seen_subexp and subtree
> not in to_eliminate:
>             to_eliminate.append(subtree)
>         seen_subexp.add(subtree)
>
>
> Now it works:
>
> In [2]: cse((x-y)*(z-y) + sqrt((x-y)*(z-y)))
> Out[2]:
> ⎛                                      ⎽⎽⎽⎽   ⎽⎽⎽⎽⎞
> ⎝[(x₀, x - y), (x₁, z - y)], x₀*x₁ + ╲╱ x₀ *╲╱ x₁ ⎠
>
>
>> This kind of thing pops up from time to time in my experience.
>
> It does, but it can be easily fixed, see above. I.e. fixing your
> algorithm. :) The other option, as you said, is fixing sympy. We would
> have to introduce a Sub class.
> But I think it would just make things more complex. No?

For all of sympy, yes. However, I think we could keep these
transformations local to the cse module. You would have to implement
just enough of Sub to replace Add(x, Mul(-constant, y)) and go back
again.

Similarly, breaking up a power of a product into a product of powers
in the default sympy representation hides optimization opportunities.
CSE could work better with a different representation.

>> So I think there needs to be a bit of a preprocessing step to optimize
>> the expression specifically for cse() before it does its magic.

I think this is the correct approach rather than putting special cases
inside cse() itself. Preprocessing to a more tractable representation,
CSE, then postprocessing allows us to add optimizations incrementally
while keeping the core algorithm clean and testable. sympy in general
doesn't need to change outside of the cse module.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To post to this group, send email to sympy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sympy?hl=en
-~----------~----~----~----~------~----~------~--~---

[sympy] Re: Identifying repeated subexpressions in systems of equations

Reply via email to