Comment #11 on issue 3129 by mrock...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129
In general PSpaces can have a many symbols bound to them. They act as
indices into the distribution. The PSpace is making a value statement about
these symbols/concepts.
Consider not using internal symbols at all. How do we represent probability
densities? The current plan is to use a Lambda. Lambdas use internal
symbols i.e.
dist = Lambda( (x,y) , exp((x**2+y**2)/(2*pi)) / ... )
The x and y here are internal symbols. Well, I suppose we could use dummies
instead but then our lambdas look bad with names like x1, x2, .... This is
a minor detail really though, why the big deal? Well, suppose we want a
random variable that corresponds to the 'y' in the probability density. How
do we specify that we want the 1th variable and not the 0th one? Well, we
could use an index. Something like
Y = RandomSymbol('Y', dist, index=1)
The idea of using an index here seems separated from what we want. In this
sense an internal symbol acts like a more conceptual index.
Regarding your example I have two comments.
(1) I suggest NormalDistribution as a name rather than Normal. I would
leave Normal for syntactic sugar later on. This is a relatively minor
disagreement though.
(2) The result you're getting is easy in the case of the normal
distribution but I think it's very challenging in even trivially more
complex situations. How does this work in the case of a beta distribution?
The current design specifically avoids any sort of special-rule for
well-known distributions. Everything is represented as a SymPy Expr. We
fail to get some nice results but, in this sense at least, the system is
much simpler.
X = BetaDistribution(2, 3).new('X')
Z = (X-2)/3
pspace(Z)
???
I suspect that a solution that attempts to make decisions like this will
necessarily become very complex.
I think that we're trying to push too much into the concept of a
distribution. I suspect that there are two separable tasks here. Managing
random symbol interaction and computing on distributions. I now think that
the concept of a probability space is probably necessary. I think that much
of the complexity of the PSpace object should be factored out into a
Distribution object and that PSpace should become very simple. Hopefully
much of the complexity can be simplified in this factoring process.
Some thoughts
There should be a single PSpace class (no subclasses).
It should contain a Distribution and a set of symbols
There should be a Distribution interface that handles things like
compute_density, integrate, P, etc....
Distribution should be subclassed to Continuous and Finite and should be
something like what is proposed above.
This separates two concepts that should have been separated before. I think
this solution is clean.
--
You received this message because you are subscribed to the Google Groups
"sympy-issues" group.
To post to this group, send email to sympy-issues@googlegroups.com.
To unsubscribe from this group, send email to
sympy-issues+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/sympy-issues?hl=en.