[sympy] Brief Intro and Poisson sampling

christopheredwardwillmott Wed, 31 Jan 2018 05:06:07 -0800

Hi all,

I'm Chris, British grad student at MIT. Circa 2 years experience in Python 
but that is mainly in small scripts written for research purposes - am 
looking to get some experience writing production level code. Have an 
advanced PhD level understanding of physics/maths so comfortable creating 
algorithms but a newcomer to sympy and open source contributions - 
therefore, the guidance I'm looking for will mainly be in moving my code 
into the "real world" (doctests/docstrings/workflow/improving code 
readability etc). Happy to work on almost any problems and big fan of 
diversity so any collaboration requests would be welcome.

I've dived right into the poisson sampling problem -
https://github.com/sympy/sympy/pull/13943
<https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fsympy%2Fsympy%2Fpull%2F13943&sa=D&sntz=1&usg=AFQjCNF1yNR7f_8VexVZfgzvhsI137Iszw>.

I have a working algorithm, similar to what was proposed by Leonid Kovalev. The
algorithm finds an upper bound on the cdf then uses a bisection to refine
the root. It works well for high values of lambda - the upper bound
calculation is very low cost and bisection is O(log2(lambda)).

It's only limitation comes from the maximum recursion depth for high lambda
cdf calculations which I think might be an issue considering in itself. I
am getting extremely variable behaviour in computing the cdf for high
values of lambda.
I'm not sure if anyone else who is more knowledgable in this area can
suggest why this might be? The behaviour I'm observing is that calculating
the cdf for values of lambda greater than ~200 gives a maximum recursion
depth error. However,
if the cdf is first calculated for all the lower values of lambda (say
100,120,140,160 etc) during debugging then it has no problem evaluating the
cdf up to 200 and even higher (which is what is done in the
previous algorithm).
I'm assuming these values are temporarily stored in memory somewhere and so
it requires less recursive calls for these higher values but I'm slightly
at a loss as to how and why it is doing this. Any pointers appreciated.
Possible solutions/workarounds could be using normal approximations (should
be a good approximation above say lambda = 100, particularly considering
the discrete nature of the sampling we're doing) or perhaps writing a new
function
specifically for the poisson cdf as it doesn't have a (particularly useful)
closed form.

As a side note - another very useful algorithm for sampling of random
distributions is the rejection sampling method
- https://en.wikipedia.org/wiki/Rejection_sampling. It does require bounds
being placed on the values
so is not suitable for every situation but it is fast and easy to
implement. Perhaps, worth considering for method overloading in situations
where there is a difficult cdf to invert and the user can provide bounds?
Open
to suggestions but just putting the idea out there.

Looking forward to working with you all.
Cheers,
Chris

--
You received this message because you are subscribed to the Google Groups
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to sympy+unsubscr...@googlegroups.com.
To post to this group, send email to sympy@googlegroups.com.
Visit this group at https://groups.google.com/group/sympy.
To view this discussion on the web visit
https://groups.google.com/d/msgid/sympy/f7ce2cd8-87f2-49e7-a0fa-46a664657ed8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[sympy] Brief Intro and Poisson sampling

Reply via email to