Hi all,

I'm Chris, British grad student at MIT. Circa 2 years experience in Python 
but that is mainly in small scripts written for research purposes - am 
looking to get some experience writing production level code. Have an 
advanced PhD level understanding of physics/maths so comfortable creating 
algorithms but a newcomer to sympy and open source contributions - 
therefore, the guidance I'm looking for will mainly be in moving my code 
into the "real world" (doctests/docstrings/workflow/improving code 
readability etc). Happy to work on almost any problems and big fan of 
diversity so any collaboration requests would be welcome. 

I've dived right into the poisson sampling problem - 
https://github.com/sympy/sympy/pull/13943 
<https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fsympy%2Fsympy%2Fpull%2F13943&sa=D&sntz=1&usg=AFQjCNF1yNR7f_8VexVZfgzvhsI137Iszw>.
 
I have a working algorithm, similar to what was proposed by Leonid Kovalev. The 
algorithm finds an upper bound on the cdf then uses a bisection to refine 
the root. It works well for high values of lambda - the upper bound 
calculation is very low cost and bisection is O(log2(lambda)). 

It's only limitation comes from the maximum recursion depth for high lambda 
cdf calculations which I think might be an issue considering in itself. I 
am getting extremely variable behaviour in computing the cdf for high 
values of lambda.
I'm not sure if anyone else who is more knowledgable in this area can 
suggest why this might be? The behaviour I'm observing is that calculating 
the cdf for values of lambda greater than ~200 gives a maximum recursion 
depth error. However, 
if the cdf is first calculated for all the lower values of lambda (say 
100,120,140,160 etc) during debugging then it has no problem evaluating the 
cdf up to 200 and even higher (which is what is done in the 
previous algorithm).
I'm assuming these values are temporarily stored in memory somewhere and so 
it requires less recursive calls for these higher values but I'm slightly 
at a loss as to how and why it is doing this. Any pointers appreciated. 
Possible solutions/workarounds could be using normal approximations (should 
be a good approximation above say lambda = 100, particularly considering 
the discrete nature of the sampling we're doing) or perhaps writing a new 
function
specifically for the poisson cdf as it doesn't have a (particularly useful) 
closed form. 

As a side note - another very useful algorithm for sampling of random 
distributions is the rejection sampling method 
- https://en.wikipedia.org/wiki/Rejection_sampling. It does require bounds 
being placed on the values
so is not suitable for every situation but it is fast and easy to 
implement. Perhaps, worth considering for method overloading in situations 
where there is a difficult cdf to invert and the user can provide bounds? 
Open 
to suggestions but just putting the idea out there. 

Looking forward to working with you all.
Cheers,
Chris 


-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sympy+unsubscr...@googlegroups.com.
To post to this group, send email to sympy@googlegroups.com.
Visit this group at https://groups.google.com/group/sympy.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sympy/f7ce2cd8-87f2-49e7-a0fa-46a664657ed8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to