Thanks for responding. I think most of what I want to say is in
response to Matthew, but let me just mention that I agree with Ronan.
Integer is very different from int. Aside from the obvious difference
with division, every attribute and method of the classes it inherits
from are avaialble on it, giving it a standard interface. This is
fundamental to this discussion. Basic subclasses are Liskov
substitutable.  I can write expr.args or expr.atoms or expr.xreplace
without worrying if expr is an Add or a Symbol or an Integer.
Division is a special case of this. This simply is not true for
strings and ints.

By the way, Ronan, which option do you prefer? Or is there another
even better option that only you can see?

On Tue, May 21, 2013 at 1:06 AM, Matthew Rocklin <mrock...@gmail.com> wrote:
> I've added my thoughts to the wiki.  I'm double posting them here.  These
> follow Aaron's opinion section which is definitely worth a read.
>
> ### Matthew
>
> I mostly prefer 1
>
> - Small point but while it's fresh in your mind, `MatrixSymbol('x', 1,
> 2).args` can't be () easily.  This is because the expression could be
> `MatrixSymbol('x', n, k+m)`.  The second two arguments are the shape and in
> general are `Expr`s.  We'll need to traverse down these so they'll either
> need to be args or we'll need to do a lot of special casing within
> `MatrixSymbol`.  A pull request exists to replace `'x'` with `Symbol('x')`.
> This seems wrong to me because `'x'` here is a name, not a mathematical
> scalar.  The right way to do this under 3 would be to create a
> `String(Basic)`.  This also feels wrong to me because we're being forced to
> reinvent a class with no new functionality.

OK, you are right about this. MatrixSymbol clearly cannot be a leaf.

Ronan is right when he says that Symbol is not a replacement for str.
The two are unrelated. You can't do Symbol('xyz')[:1] or Symbol('
').join(['a', 'b', 'c']), and Symbol('x') + Symbol('y') != 'x' + 'y'.
 But I think it doesn't matter. We don't need a full-fledged string,
just like we don't really need it anywhere that we use Symbol (for
those rare cases where we do do some string processing we just use
Symbol.name).  The only aspect of strings that we use in Symbols
(aside from the fact that they are names) is equality, and for that,
they agree: Symbol(a) == Symbol(b) iff a == b.

So my recommendation here goes back to using Symbol as the first
argument of MatrixSymbol. Also, the class should clearly not subclass
from Symbol in this case (I don't remember if it already does).

Perhaps a mathematician's point of view is worth considering here
(this is my point of view, when I am in my graduate student shoes).
When we have a mathematical object like MatrixSymbol that is indexed
by another expression, in this case, the rows and columns, it is often
convenient to consider that object as a function.  In this case, a
matrix symbol is a function from NxN -> M, where M is the set of all
matrices over whatever (complex numbers).

This functorial relationship for us means that those are children in
the expression tree. Now, I'm no logician, but to me, it makes sense
to consider a matrix symbol also as a function on the alphabet
(strings).  The name we give to a matrix symbol is inconsequential to
what it is, at least mathematically.

Anyway, that analogy may or may not be completely off base, so take it
or leave it.

>
> - In general, I think that having all identifying information within args
> will simplify the code in the long term.

This is where I think you are completely wrong.  As I noted in my
first bullet point of my opinions, it seems like it will do this,
because the invariant becomes simpler. But as I hope I demonstrated in
the three fundamental expression recursion algorithms, it will make
things much more complicated. The reason is that we have to check if
things are instances of Basic everywhere, and handle them differently.
 Non-Basics don't have Basic interface (in particular, .args, but
really anything that we might want to do).  We can't even be sure that
we can do math operations on non-Basics, even if they are Python
builtin numbers, because of the int/int problem (not to mention the
floating point precision issues).

If we go with option 1, what I see happening all throughout SymPy is
code being nested under "if isinstance(expr, Basic)" blocks. Very
little useful code will actually make sense on non-Basics; most if it
will fail with AttributeError or some other exception (if you don't
believe me, remove "arg = sympify(arg)" from the top of any algorithm
and try passing 0 to it).

> It also opens various performance
> improvements.  See the following timings from my laptop
>
> ```
> In [2]: timeit Add(x, y, z)
> 100000 loops, best of 3: 2.15 us per loop
>
> In [3]: timeit Add(x, y, z, evaluate=False)  # no idea why this is slower
> 100000 loops, best of 3: 4.14 us per loop
>
> In [4]: timeit Basic.__new__(Add, x, y, z)
> 1000000 loops, best of 3: 616 ns per loop
>
> In [5]: timeit (Add, x, y, z)
> 10000000 loops, best of 3: 96.7 ns per loop
> ```

You mentioned this before, and I didn't get it then either. How does
this have anything to do with what the leaves of the expression tree
are?

>
> Having all information in args turns the SymPy Basic into something like an
> s-expression (a glorified tuple).  Certainly we don't want to go this far
> (we like lots of things that the Basic object gives to us), but having that
> simplicity enables more advanced manipulations of SymPy expressions with
> greater trust in their accuracy.  If someone wrote code that required really
> intense expression manipulation they could switch into tuple mode (or list
> mode for mutability), do lots of manipulations quickly, then switch back and
> re-evaluate.  If all information is in args then they could perform these
> transformations with high fidelity.  I'm not proposing that this is the
> reason we should switch but rather that this is an example of something that
> is easy with an invariant like "All identifying information is in `.args`"

I claim that this is just as easy (in fact easier, because of the
isinstance(expr, Basic) issue) with the option 3 invariant. You just
treat objects with empty args as leaves.  These are the symbolic atoms
from your lisp metaphor.

At the end of the day, *something* has to be a symbolic atom.  You
can't recurse forever, until you hit the ones and zeros of the machine
representation and wonder how you can split those down into smaller
entities. I don't see why it has to be a non-Basic type.

>
> - Personally I manipulate SymPy expressions quite a bit.  I often do this
> from outside SymPy (Theano, LogPy, Strategies).  Having the "all identifying
> information is in `.args`" invariant makes SymPy significantly more
> interoperable if you don't have access to the codebase (or don't want to
> fuss with the codebase).

I don't understand what you mean by "access to the codebase". How are
you doing anything with any SymPy classes if you can't access them?

> Classes like Symbol, Rational, AppliedPredicate
> all end up requiring special attention in each of these endeavors.  I'm
> mostly willing to put up with it because I'm familiar with them but
> potential collaborators are often turned off.  I believe that
> interoperability is more important than we realize for long term growth.

Well I'd love to hear testimonials from these collaborators.  I think
it's important for us as core contributors who are very familiar with
SymPy and its idioms to see how people who are new to it view things.
That will show us what things we are doing that need to be documented
better.

I suspect these people are falling into the trap of seeing the
simplicity of the option 1 invariant over the option 3 invariant,
without noticing that it really leads to more complicated code.

Aaron Meurer

>
> I believe that interoperability is the most important direction for SymPy
> right now (personal opinion).  I believe that the "all identifying
> information is in `.args`" invariant is essential for that goal.
>
>
> On Mon, May 20, 2013 at 3:25 PM, Aaron Meurer <asmeu...@gmail.com> wrote:
>>
>> I think Matthew wanted to delay this discussion until the end of the
>> summer, but I'd like to start it now while it is fresh on my mind.  I
>> am currently writing the expression manipulation part of the new
>> tutorial (http://docs.sympy.org/tutorial/tutorial/manipulation.html),
>> and this issue is central to my discussion.
>>
>> There has been some discussion from time to time about the invariants
>> that SymPy objects are supposed to follow.  The invariants are
>> something like "all elements of an object's .args should be instances
>> of Basic" and "all Basic objects should be rebuildable from their
>> args, like obj.func(*obj.args) == obj".
>>
>> The main discussion has been about the first one. Should we allow
>> non-Basic args?  The common example if Symbol.  Currently, Symbol('x')
>> has empty args.  The proposal would make Symbol('x').args be ('x',).
>> The same for Integer. Integer(2).args is just (), but the proposal
>> would make it (int(2),).
>>
>> But notice that the two invariants as I stated them above are
>> inconsistant, because Symbol('x') is not rebuildable from its args
>> unless its args are ('x',).
>>
>> I have started a wiki page to gather my thoughts on this at
>> https://github.com/sympy/sympy/wiki/Args-Invariant.  Basically, I
>> think there are basically two ways that we should go, which are called
>> option 1 and option 3 on the wiki page (option 2 is something I think
>> we should throw out immediately).  The question boils down to what a
>> leaf in an expression tree is. The rest follows from that.  The
>> options are
>>
>> 1. Any non-Basic
>> 3. Any object with empty args
>>
>> If we choose option 1, then the invariant becomes `obj.func(*obj.args)
>> == obj`. If we choose option 2, then the invariant becomes `obj.args
>> == () or obj.func(*obj.args) == obj`.
>>
>> Now, I don't think it's any secret that I prefer option 3, and that
>> some people in the community (namely Matthew and Stefan) have been
>> arguing for option 1, but I think we should work out all the
>> differences, and pick the best one.
>>
>> So without making this email twice as long as it already is, I refer
>> you to the wiki page, where I've started an extensive comparison of
>> the two options.  Feel free to edit it to add more pros/cons,
>> examples, or to fix the formatting.
>>
>> I've also created a section of the wiki page on the bottom for
>> opinions. Please keep the main part to facts, and put your opinions at
>> the bottom.  We can also discuss opinions here on the mailing list.
>>
>> Aaron Meurer
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "sympy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to sympy+unsubscr...@googlegroups.com.
>> To post to this group, send email to sympy@googlegroups.com.
>> Visit this group at http://groups.google.com/group/sympy?hl=en-US.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sympy+unsubscr...@googlegroups.com.
> To post to this group, send email to sympy@googlegroups.com.
> Visit this group at http://groups.google.com/group/sympy?hl=en-US.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sympy+unsubscr...@googlegroups.com.
To post to this group, send email to sympy@googlegroups.com.
Visit this group at http://groups.google.com/group/sympy?hl=en-US.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to