Very cool. I think Ondrej said that he actually started it earlier than that, but I can't find the thread on the mailing list.
On Mon, Aug 1, 2011 at 3:55 PM, Mateusz Paprocki <matt...@gmail.com> wrote: > Hi, > I was recently browsing SymPy's SVN repository (sympy-oldcore branch which > contains the original implementation of SymPy) and I noticed that the first > commit was created on 2006-08-01 02:03:30 +0200 (Tue, 01 Aug 2006), which > means that today it's exactly five years since SymPy was started. Myself I > joined the project at r903 (24 April 2007). SymPy started as a simple > calculator with soon added uncanny capability for computing limits (due to > implementation of Gruntz algorithm). Since then SymPy grown very quickly and > become a reasonably sized project with 25 modules covering very many fields > of mathematics and physics, with over 200k lines of code written by 145 > developers from all around the world. One of it's modules, sympy.numerics, > become an important project on its own (mpmath). I think SymPy owes it's success to three things. First is the bazaar model that you mentioned. SymPy is very decentralized in it's structure. While it's true that we do have a central official repository, this is not what I'm talking about. Things like git/GitHub make it super easy to contribute in a highly decentralized way. But even beyond that, I don't think there's a single person who knows every part of the code base (maybe someone can prove me wrong). I know I don't, as I've not used most of the physics module (nor would I know how to, as the physics is beyond me). Mateusz told me at SciPy that he doesn't know anything about it either. On the other hand, I think Mateusz and I are the main ones who know how the polys work. Yet both modules are very healthy and growing. In fact, I think the average mathematical knowledge of our contributors is less than you would expect from a computer algebra system. Most contributors are people like physicists and engineers who don't know about things like Groebner bases or Gosper's algorithm, even though these are considered classics of computer algebra. In fact, I think most contributors would not even describe themselves as mathematicians. This goes against the conventional wisdom about CASs, which is like you said where they have to be written by PhD researchers in computer algebra, but it's worked. It turns out that, for example, you can write a decent ODE module with only a first course in ODEs. Second is the very open way that we approach new contributors. Ondrej wrote a blog post about this a while back (http://ondrejcertik.blogspot.com/2009/05/my-experience-with-running-opensource.html). Being an bazaar model open source project doesn't imply that people will contribute to it. Just because it's easy for people to contribute, doesn't mean that they will. You have to encourage people to do it. You have to treat each user as a potential contributor. With SymPy, most users know Python, which already pushes down the biggest barrier to contribution. But you have to be patient, and very helpful, especially with git. But it's often just as much as saying to someone who complains about a bug, "if you want to contribute a patch, that would be great. We will help you out." Many people don't even consider contributing until you say that, at which point they do. I don't know how many of our 145 contributors started this way, but I know personally that quite a few of them did, which culminates to many important bug fixes and enhancements. And third, I think we can't ignore perhaps the biggest thing we have going for us, which is that SymPy is written in Python. This language is so easy to read and write that contribution is easy, even when it means finding and fixing a bug in a large codebase. And its popularity in recent years, especially in the arena of scientific computing, has propelled us. Again, conventional wisdom says that Python is too slow for something like a full-blown computer algebra system, and that it needs to be written in a language like C or Lisp, but here again we have proven not only that Python is sufficiently fast, but that it's actually a better language than these other ones because of its ease of use and speed of development. And this again makes it easier for people to contribute, because they look at the code and say, "hey, this is just Python. I *know* this." > From its beginning, SymPy was developed using the bazaar approach, with no > central planning at all. Culmination of this approach was adoption of git > for source code management and later GitHub for managing patches (via pull > requests). Symbolic mathematics and computer algebra systems were usually > developed in small coherent environments, for example within a group at a > university. Development process of SymPy showed that this doesn't have to be > the case, because SymPy isn't connected to any coherent group of people or > any university or even country. Most SymPy developers didn't ever meet other > in person. It's actually amazing that a project of this complexity can grow > that fast despite this. A huge help was Google's Summer of Code program, > which allowed us to hire many excellent students who have brought > significant contributions to the project. > Of course not everything is perfect. SymPy could be a little faster, better > documented and allow for much easier embedding in other projects. I hope > that in parallel to continuous growth of new features, like new solvers, new > symbolic integration algorithms and other, we will also focus on those three > issues. We made a little progress with documentation during SciPy > conference, but this requires far more work and understanding from > developers that documentation is as much important as code. SymPy already > has a lot of cool features, but our users won't have a clue about this, > until SymPy gets better documentation. Regarding speed, I think we need to do more stress tests. I discovered that as_numer_denom() is too slow (issue 2607) by constructing a big expression (I think it was Add(*(exp(i*x) for i in xrange(1000)))) and passing it to risch_integrate(). This should be trivial to integrate, but it was taking forever. It wasn't too difficult to discover that it was integrating just fine, but at some point, an expression like Add(*(x**i for i in xrange(1000))).as_numer_denom() was being called, and it was being too slow. I then discovered that the algorithm there is very inefficient, and am now working on improving it. I think if we habitually test our algorithms with large expressions like this and then profile the code to see where it is slow, we can find these bottlenecks and improve them. This is also how I discovered is_rational_function was too slow back with 456e211647d4. I'm convinced that when something in SymPy is slow, it's not because Python is slow, but because some algorithm or implementation is inefficiently implemented, and this can and should be fixed. Regarding documentation, I also agree that it should be top priority. Should we try to have another doc day? I'd especially like to improve the situation of docs not imported into Sphinx (pretty bad) and functions lacking docstrings/doctests (not as bad, but still pretty bad in some modules). Regarding embedding, it's not clear what you see as the weaknesses here. How would you improve this area? Personally, I see SymPy as one of the most embeddable CASs, being pure Python with zero dependencies. And by the way, to me, the two most important functions that we should try to improve in SymPy are simplify() and solve(). These are the first two functions that people are going to try to use when they try SymPy, and are probably be the most even to experienced users. As Chris said a few weeks ago, the power of these functions will show people that this is not just a "toy" project, as they are the first things they will try to test its power (actually, I think the first thing that makes people notice that this is not a toy is the unicode pretty printer, but this is already pretty awesome and complete). These functions have improved a lot in just the past year, but they still have a long way to go. Generic simplification and solving are very hard problems in computer algebra, and they will ultimately have to rely on heuristics in the general case, but we should try to make these as powerful as possible. > How I see SymPy in 5 years? It's obvious that I would like to see SymPy > feature complete, fast, well documented, etc. However, what I would like to > see even more is SymPy being used by much more people in research and > teaching. This is our task to not only develop the project but also to > deploy it: show it to people, explain its strengths and describe the areas > of application. I hope that next five years will be even more productive > than the passing 5 years were. > Mateusz > In five years (and beyond), I see SymPy becoming the premiere open source computer algebra system in more and more areas. We are already doing this. With Tom's Meijer G integration algorithm, we will have the most powerful open source symbolic definite integrator. From what I've heard, our quantum code excels beyond everything else in existence. Also, I think we can and should do more work with GUI environments for people who are not used to using a command line. I think the IPython qtconsole with mathjax printing is a good start here. This way we will really be able to convert people from Maple and Mathematica. Finally, a little contrary to what I said above, we do need more of these nontrivial algorithms like the Risch, Meijer G, factorization, or Groebner algorithms to be implemented in SymPy. This is what really gives it power and speed beyond being a "toy" project, and is the only way that we will be the premiere CAS in any area. Aaron Meurer -- You received this message because you are subscribed to the Google Groups "sympy" group. To post to this group, send email to sympy@googlegroups.com. To unsubscribe from this group, send email to sympy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sympy?hl=en.