Re: [sympy] 5 years of SymPy

Aaron Meurer Mon, 01 Aug 2011 20:10:40 -0700

Very cool.  I think Ondrej said that he actually started it earlier
than that, but I can't find the thread on the mailing list.

On Mon, Aug 1, 2011 at 3:55 PM, Mateusz Paprocki <matt...@gmail.com> wrote:
> Hi,
> I was recently browsing SymPy's SVN repository (sympy-oldcore branch which
> contains the original implementation of SymPy) and I noticed that the first
> commit was created on 2006-08-01 02:03:30 +0200 (Tue, 01 Aug 2006), which
> means that today it's exactly five years since SymPy was started. Myself I
> joined the project at r903 (24 April 2007). SymPy started as a simple
> calculator with soon added uncanny capability for computing limits (due to
> implementation of Gruntz algorithm). Since then SymPy grown very quickly and
> become a reasonably sized project with 25 modules covering very many fields
> of mathematics and physics, with over 200k lines of code written by 145
> developers from all around the world. One of it's modules, sympy.numerics,
> become an important project on its own (mpmath).

I think SymPy owes it's success to three things.  First is the bazaar
model that you mentioned.  SymPy is very decentralized in it's
structure.  While it's true that we do have a central official
repository, this is not what I'm talking about.  Things like
git/GitHub make it super easy to contribute in a highly decentralized
way.  But even beyond that, I don't think there's a single person who
knows every part of the code base (maybe someone can prove me wrong).
I know I don't, as I've not used most of the physics module (nor would
I know how to, as the physics is beyond me).  Mateusz told me at SciPy
that he doesn't know anything about it either.  On the other hand, I
think Mateusz and I are the main ones who know how the polys work.
Yet both modules are very healthy and growing.

In fact, I think the average mathematical knowledge of our
contributors is less than you would expect from a computer algebra
system.  Most contributors are people like physicists and engineers
who don't know about things like Groebner bases or Gosper's algorithm,
even though these are considered classics of computer algebra.  In
fact, I think most contributors would not even describe themselves as
mathematicians.  This goes against the conventional wisdom about CASs,
which is like you said where they have to be written by PhD
researchers in computer algebra, but it's worked.  It turns out that,
for example, you can write a decent ODE module with only a first
course in ODEs.

Second is the very open way that we approach new contributors.  Ondrej
wrote a blog post about this a while back
(http://ondrejcertik.blogspot.com/2009/05/my-experience-with-running-opensource.html).
 Being an bazaar model open source project doesn't imply that people
will contribute to it.  Just because it's easy for people to
contribute, doesn't mean that they will.  You have to encourage people
to do it.  You have to treat each user as a potential contributor.
With SymPy, most users know Python, which already pushes down the
biggest barrier to contribution.  But you have to be patient, and very
helpful, especially with git.

But it's often just as much as saying to someone who complains about a
bug, "if you want to contribute a patch, that would be great.  We will
help you out."  Many people don't even consider contributing until you
say that, at which point they do.  I don't know how many of our 145
contributors started this way, but I know personally that quite a few
of them did, which culminates to many important bug fixes and
enhancements.

And third, I think we can't ignore perhaps the biggest thing we have
going for us, which is that SymPy is written in Python.  This language
is so easy to read and write that contribution is easy, even when it
means finding and fixing a bug in a large codebase.  And its
popularity in recent years, especially in the arena of scientific
computing, has propelled us.  Again, conventional wisdom says that
Python is too slow for something like a full-blown computer algebra
system, and that it needs to be written in a language like C or Lisp,
but here again we have proven not only that Python is sufficiently
fast, but that it's actually a better language than these other ones
because of its ease of use and speed of development. And this again
makes it easier for people to contribute, because they look at the
code and say, "hey, this is just Python.  I *know* this."

> From its beginning, SymPy was developed using the bazaar approach, with no
> central planning at all. Culmination of this approach was adoption of git
> for source code management and later GitHub for managing patches (via pull
> requests). Symbolic mathematics and computer algebra systems were usually
> developed in small coherent environments, for example within a group at a
> university. Development process of SymPy showed that this doesn't have to be
> the case, because SymPy isn't connected to any coherent group of people or
> any university or even country. Most SymPy developers didn't ever meet other
> in person. It's actually amazing that a project of this complexity can grow
> that fast despite this. A huge help was Google's Summer of Code program,
> which allowed us to hire many excellent students who have brought
> significant contributions to the project.

> Of course not everything is perfect. SymPy could be a little faster, better
> documented and allow for much easier embedding in other projects. I hope
> that in parallel to continuous growth of new features, like new solvers, new
> symbolic integration algorithms and other, we will also focus on those three
> issues. We made a little progress with documentation during SciPy
> conference, but this requires far more work and understanding from
> developers that documentation is as much important as code. SymPy already
> has a lot of cool features, but our users won't have a clue about this,
> until SymPy gets better documentation.

Regarding speed, I think we need to do more stress tests.  I
discovered that as_numer_denom() is too slow (issue 2607) by
constructing a big expression (I think it was Add(*(exp(i*x) for i in
xrange(1000)))) and passing it to risch_integrate().  This should be
trivial to integrate, but it was taking forever.  It wasn't too
difficult to discover that it was integrating just fine, but at some
point, an expression like Add(*(x**i for i in
xrange(1000))).as_numer_denom() was being called, and it was being too
slow.  I then discovered that the algorithm there is very inefficient,
and am now working on improving it.

I think if we habitually test our algorithms with large expressions
like this and then profile the code to see where it is slow, we can
find these bottlenecks and improve them. This is also how I discovered
is_rational_function was too slow back with 456e211647d4.  I'm
convinced that when something in SymPy is slow, it's not because
Python is slow, but because some algorithm or implementation is
inefficiently implemented, and this can and should be fixed.

Regarding documentation, I also agree that it should be top priority.
Should we try to have another doc day?  I'd especially like to improve
the situation of docs not imported into Sphinx (pretty bad) and
functions lacking docstrings/doctests (not as bad, but still pretty
bad in some modules).

Regarding embedding, it's not clear what you see as the weaknesses
here.  How would you improve this area?  Personally, I see SymPy as
one of the most embeddable CASs, being pure Python with zero
dependencies.

And by the way, to me, the two most important functions that we should
try to improve in SymPy are simplify() and solve().  These are the
first two functions that people are going to try to use when they try
SymPy, and are probably be the most even to experienced users.  As
Chris said a few weeks ago, the power of these functions will show
people that this is not just a "toy" project, as they are the first
things they will try to test its power (actually, I think the first
thing that makes people notice that this is not a toy is the unicode
pretty printer, but this is already pretty awesome and complete).
These functions have improved a lot in just the past year, but they
still have a long way to go.  Generic simplification and solving are
very hard problems in computer algebra, and they will ultimately have
to rely on heuristics in the general case, but we should try to make
these as powerful as possible.

> How I see SymPy in 5 years? It's obvious that I would like to see SymPy
> feature complete, fast, well documented, etc. However, what I would like to
> see even more is SymPy being used by much more people in research and
> teaching. This is our task to not only develop the project but also to
> deploy it: show it to people, explain its strengths and describe the areas
> of application. I hope that next five years will be even more productive
> than the passing 5 years were.
> Mateusz
>

In five years (and beyond), I see SymPy becoming the premiere open
source computer algebra system in more and more areas. We are already
doing this.  With Tom's Meijer G integration algorithm, we will have
the most powerful open source symbolic definite integrator.  From what
I've heard, our quantum code excels beyond everything else in
existence.

Also, I think we can and should do more work with GUI environments for
people who are not used to using a command line.  I think the IPython
qtconsole with mathjax printing is a good start here.  This way we
will really be able to convert people from Maple and Mathematica.

Finally, a little contrary to what I said above, we do need more of
these nontrivial algorithms like the Risch, Meijer G, factorization,
or Groebner algorithms to be implemented in SymPy.  This is what
really gives it power and speed beyond being a "toy" project, and is
the only way that we will be the premiere CAS in any area.

Aaron Meurer

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To post to this group, send email to sympy@googlegroups.com.
To unsubscribe from this group, send email to 
sympy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sympy?hl=en.

Re: [sympy] 5 years of SymPy

Reply via email to