Re: [sympy] Test failures in master: Hash randomization

Aaron Meurer Sun, 24 Jun 2012 18:45:43 -0700

On Jun 24, 2012, at 1:11 PM, "Ondřej Čertík" <ondrej.cer...@gmail.com> wrote:


> On Sat, Jun 23, 2012 at 10:32 PM, Aaron Meurer <asmeu...@gmail.com> wrote:
>> Hi everyone.
>>
>> I just merged pull request https://github.com/sympy/sympy/pull/1379,
>> which enables hash randomization by default in the test runner.  What
>> this means is that if you are running Python 2.6.8, 2.7.3, 3.2.3, or
>> 3.3, then you are going to start seeing a lot of test failures.  Hash
>> randomization randomizes the hash values of strings, which results in
>> random hash values of all SymPy objects.  Because we still rely on
>> hash values for ordering in a lot of places, this means that there are
>> a lot of failures when it is enabled.
>>
>> In order to facilitate sane testing with this, hash randomization is
>> seeded.  However, the only way to seed it is to set the environment
>> variable PYTHONHASHSEED before starting Python.  Therefore, when the
>> tests are run in one of the above Python versions, they are not run in
>> a separate subprocess.  If this is a problem for anyone, or if you
>> desire to disable hash randomization, you can run the tests with
>> ./bin/test --no-subprocess or ./bin/doctest --no-subprocess.  Note
>> that in all Python versions after 3.3 it will be enabled by default.
>>
>> So here are some important things to remember:
>>
>> - The test and doctest headers now looks like this:
>>
>> ============================= test process starts 
>> ==============================
>> executable:         /usr/bin/python3  (3.2.3-candidate-2)
>> architecture:       64-bit
>> cache:              yes
>> ground types:       python
>> random seed:        45250937
>> hash randomization: on (PYTHONHASHSEED=61944319)
>>
>> Notice the new item at the bottom, "hash randomization".  The seed is
>> given there too.  If hash randomization is not supported (e.g., in
>> Python 2.5), it will say "hash randomization: off".
>
>
> P.S. The Travis CI says "hash randomization: off", so that means that the 
> Python
> executable there doesn't support "-R" yet, right? They have
> version 2.7.2+.

Yes.  It looks like all of their versions are one below the latest,
which is what is required for hash randomization. I opened
https://github.com/travis-ci/travis-ci/issues/614 for it.

>
>>
>> - To run the tests with a specific seed, set the environment variable, like
>>
>> $ PYTHONHASHSEED=61944319 python bin/test
>>
>> - Note that the tests and doctests are run in separate subprocesses
>> with separate seeds with setup.py test.
>>
>> - To reproduce failures, you need both the same seed *and* the same
>> architecture (32-bit or 64-bit), because both are used to compute hash
>> values.
>>
>> - Finally, there is a pretty bad bug with some seeds where test_expand
>> hangs.  We should put priority on this, but until then, you may want
>> to run the tests with --timeout=60 to timeout any test that runs for
>> longer than a minute.
>>
>> Let's keep a log of all failures and seeds that can reproduce them at
>> http://code.google.com/p/sympy/issues/detail?id=3272 (or if it would
>> be easier, we could start a wiki page for them).  Any help fixing any
>> of these problems would be greatly appreciated.  We cannot release
>> until they are all fixed (and conversely, once they are all fixed, we
>> should be able to get a release candidate out almost right away).
>> Hopefully we won't have to resort to any XFAILing.
>>
>> And lastly, if anyone has any thoughts on how we could canonically
>> order the arguments of Add and Mul independent of hash values, but is
>> still just as fast as hash values, I would love to hear it.  If we
>> could do that, it would make fixing these errors a lot easier (on the
>> other hand, maybe we would be better off design-wise if we made
>> everything .arg ordering agnostic).
>
> I think that the fastest and simplest is to simply use native Python data 
> types
> like dictionaries, and those will depend on the hash. The final results should
> not depend on the hash, for example in printing we are simply sorting it.

I'm not exactly clear on what you are suggesting here.

At any rate, I'm now beginning to think that the better solution is to
just not worry about arg ordering, and instead make sure that
everything works no matter what the ordering.  We might even at some
point remove ordering altogether (see for example
https://github.com/sympy/sympy/pull/722), which should make things a
little faster in the core if we can pull it off.

>
> One problem is the usage of .args to access the individual terms in Add and 
> Mul.
> If those depend on the hash, then I guess one should not really access
> it directly,
> except for some quick prototyping in ipython.

I think the problem is also when you iterate over .args, and the
result depends on the order that you get things.  For example, with
cse(), we are getting different results.  All of them are common
subexpression eliminations of the expression, but different
expressions are taken out in different orders and things like that.
So we either need to restrict the definition of what cse() returns so
that it is unique, or else expand our tests for it so that any valid
result passes.

There's also the issue of functions that return lists when the result
is inherently unordered, like solve().  So solve(x**2 - 2, x) could
return either [sqrt(2), -sqrt(2)] or [-sqrt(2), sqrt(2)]. The best
solution here is probably to change solve to return a set, or some
similarly unordered structure, but it would break a lot of things too
(like anything that indexes the result of solve()).

And of course, the easy way out is always to arbitrarily but
canonically sort the args, like with default_sort_key.  But if this is
really unnecessary, as I think it is with all cases except for
printing, the better way would be to do as I said above.

Aaron Meurer

>
> I just sent an email to the list with the subject "Testing:
> differences between platforms", not realizing
> that you already took care of this.
>
> Ondrej
>
> --
> You received this message because you are subscribed to the Google Groups 
> "sympy" group.
> To post to this group, send email to sympy@googlegroups.com.
> To unsubscribe from this group, send email to 
> sympy+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/sympy?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To post to this group, send email to sympy@googlegroups.com.
To unsubscribe from this group, send email to 
sympy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sympy?hl=en.

Re: [sympy] Test failures in master: Hash randomization

Reply via email to