Comment #11 on issue 3129 by mrock...@gmail.com: Drastic change to sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

In general PSpaces can have a many symbols bound to them. They act as indices into the distribution. The PSpace is making a value statement about these symbols/concepts.

Consider not using internal symbols at all. How do we represent probability densities? The current plan is to use a Lambda. Lambdas use internal symbols i.e.

dist = Lambda( (x,y) , exp((x**2+y**2)/(2*pi)) / ... )

The x and y here are internal symbols. Well, I suppose we could use dummies instead but then our lambdas look bad with names like x1, x2, .... This is a minor detail really though, why the big deal? Well, suppose we want a random variable that corresponds to the 'y' in the probability density. How do we specify that we want the 1th variable and not the 0th one? Well, we could use an index. Something like

Y = RandomSymbol('Y', dist, index=1)

The idea of using an index here seems separated from what we want. In this sense an internal symbol acts like a more conceptual index.

Regarding your example I have two comments.

(1) I suggest NormalDistribution as a name rather than Normal. I would leave Normal for syntactic sugar later on. This is a relatively minor disagreement though. (2) The result you're getting is easy in the case of the normal distribution but I think it's very challenging in even trivially more complex situations. How does this work in the case of a beta distribution? The current design specifically avoids any sort of special-rule for well-known distributions. Everything is represented as a SymPy Expr. We fail to get some nice results but, in this sense at least, the system is much simpler.

X = BetaDistribution(2, 3).new('X')
Z = (X-2)/3
pspace(Z)
???

I suspect that a solution that attempts to make decisions like this will necessarily become very complex.

I think that we're trying to push too much into the concept of a distribution. I suspect that there are two separable tasks here. Managing random symbol interaction and computing on distributions. I now think that the concept of a probability space is probably necessary. I think that much of the complexity of the PSpace object should be factored out into a Distribution object and that PSpace should become very simple. Hopefully much of the complexity can be simplified in this factoring process.

Some thoughts
There should be a single PSpace class (no subclasses).
It should contain a Distribution and a set of symbols
There should be a Distribution interface that handles things like compute_density, integrate, P, etc.... Distribution should be subclassed to Continuous and Finite and should be something like what is proposed above.

This separates two concepts that should have been separated before. I think this solution is clean.

--
You received this message because you are subscribed to the Google Groups 
"sympy-issues" group.
To post to this group, send email to sympy-issues@googlegroups.com.
To unsubscribe from this group, send email to 
sympy-issues+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sympy-issues?hl=en.

Reply via email to