On Apr 10, 2013, at 3:55 AM, Tom Bachmann <e_mc...@web.de> wrote:

This is essentially a continuation of my email "General strategy for
removing old assumptions".

One fairly evident problem with replacing the old assumptions system by the
new one in a naive way is caching. Recall that sympy has a "global cache".
Any function decorated with @cacheit will benefit from it. The cache works
by assuming that the return value of the function depends exclusively on
its arguments (i.e. the function is sideeffect-free).

This makes sense when all assumptions are stored with their symbols. (The
pervasive use of caching has lead to problems in the past, but I think it
is fine in principle.) On the other hand, as soon as assumptions are
separated from symbols, this will fail (and usually in subtle ways).

Let's be clear that this is an inherent feature of the new assumptions
system: assumptions *cannot* in general be tied to symbols, consider for
example "x > y". We *want* to be able to support such assumptions, and this
invalidates the naive caching strategy.


What modules rely most heavily on caching? I know that series does very
much, and meijerint does to some extent. What about other modules?


Let's not forget the most important module of all, the core. I know expand
relies on the cache for most of its speed. I think it's used elsewhere in
the core too.



There have been long discussions about caching before, in various contexts.
Let me try to summarize what I think was the consensus:

The global cache, as it stands, is bad. Two name only two issues, it can
grow in long computations, and it has often lead to very subtle bugs (when
some function was not truly side-effect free).

Any other general objections to the global cache?


These subtle bugs should not be brushed aside. There are still several test
failures with the cache turned off, and issues with it turned on.
Sometimes, these aren't even testable because the cache is in the wrong
state when the test is run, and determining what the right state is is
difficult. For example, the bug described at
https://groups.google.com/d/topic/sympy/3qohaXrd5hs/discussion. Also, at
https://github.com/sympy/sympy/pull/1746 there was some caching issue that
was never fixed, just worked around, because we could never debug it. Aside
from being very sensitive to state and having large objects in memory, the
cache is hard to debug also because in general you may have to run some
very long computation (like the whole test suite) just to reproduce a bug.


It is also felt that it should be unnecessary, as demonstrated by the
success of e.g. sympycore. On the other hand, we accept that some
algorithms (e.g. gruntz) can profit a lot from caching. Even if it would be
possible to redesign them in such a way as not to unnecessarily repeat the
same computation over and over, it is felt that using a cache achieves the
same thing at a much smaller cost.

Did I miss any important points regarding caching?


The question now becomes what to do about the cache, in the context of the
new assumptions system, both in the long and in the short term. In the
short term, I see three ways of making new assumptions work with the cache:

(1) disable the cache
(2) clear the cache on every (gobal) assumptions change
(3) hash the assumptions during caching

If these are all the possibilities, then this is really about deciding
which is the least horrible. It would be tempting in (3) only to hash
assumptions which contain symbols that are also in the arguments of the
function, but this is wrong. Thus the difference between (2) and (3) is
fairly small. (1) is going to make gruntz unusable. (2) or (3) could yield
a usable gruntz, but this is not obvious. [There is a better chance now
than during the last attempts, because since then I did some changes to
gruntz which improve caching by creating fewer fresh dummies, and thus
fewer extra assumptions.]


I think that (2) is a bad strategy, even mid-term. The reason is that they
force us to stop "making new assumptions" during any computation that
relies on caching. However, I think patterns like

x = dummy()
with assuming(Q.possitive(x)):
 do some stuff

(or x = dummy(positive=True) in the old system) are just too nice to avoid.
And even if we avoid them in say all of gruntz.py (which seems managable),
our cache will still be flushed any time we "accidentally" call some other
function using this pattern.

Strategy (3) mitigates this to some extent, but not completely.


However, what I think should be observed, is that most modules will need
changing for the new assumptions system anyway. So doing *minor* changes to
the caching strategy is actually fine. This opens two more options, I think:

(4) introduce a clear_cache keyword to assuming
(5) introduce local caches

By (4), I mean that we replace the above pattern by

x = dummy()
with assuming(Q.possitive(x), clear_cache=False):
 do some stuff.

clear_cache will be True by default, and do what you think it does.
However, in the above situation, it is safe not to clear the cache, since
the dummy is fresh. I *think* this should solve most problems. Indeed, this
pattern always works when creating "fresh" symbols, and so it works in all
situations where the current assumptions are set up (since we currently
never allow to change them).


That seems way too sensitive. People will either not use it, or use it
incorrectly. O ey will use it incorrectly but then someone else will change
the code in a way tag invalidates it without realizing it.



By (5), I mean to add an additional "id" to the @cacheit decorator, roughly
like this (code written for clarity, not performance):

def cacheit(id, func):
 def wrapped(*args):
   if id in local_caches_dict:
     if hash(args) in local_caches_dict[id]
       return local_caches_dict[id][hash(args)]
     else:
       res = func(*args)
       local_caches_dict[id][hash(args)] = res
       return res
   else:
     with new_local_cache(id):
       return wrapped(*args)
 return wrapped

Here new_local_cache is a context manager which creates and destroys an
entry in local_caches_dict. This kind of cache decorator is tailored
towards the use of caching in gruntz. Will it help in other cases?


It seems to me that, under the general strategy outlined in my previous
mail, (4) and (5) seem essentially equally beneficial. (5) feels slightly
cleaner to me.

What do you guys think?


Perhaps what we really need is memoization. The difference between
memoization and caching for me is that memoization is not global. So for
example, we could memoize the gruntz algorithm, but at the end of each
gruntz call, it would reset the cached values. If the "cache" we're
basically always reset between atomic operations, then these subtle issues
of things affecting other things would go away.

Ultimately, though, the issue with the cache (or at least one major issue)
is that it puts very subtle restrictions on code that people don't realize.
You cannot make any state modifications inside a cached function. This
doesn't sound common, but it actually is if the function is a method on a
class, and the state is some internal fact that we want to keep track of.

Aaron Meurer


Thanks,
Tom

-- 
You received this message because you are subscribed to the Google Groups
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to sympy+unsubscr...@googlegroups.com.
To post to this group, send email to sympy@googlegroups.com.
Visit this group at http://groups.google.com/group/sympy?hl=en-US.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sympy+unsubscr...@googlegroups.com.
To post to this group, send email to sympy@googlegroups.com.
Visit this group at http://groups.google.com/group/sympy?hl=en-US.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to