Hallo, I've been using theano through pymc3 a lot recently and came across a hopefully realistic and useful optimization that theano does not seem to do. Most theano graphs of models I've seen contain the combination of an Elemwise Node and after that a Sum node. Often a lot of time is spend in the Sum nodes (40% of the total run time in some of my real world models). As far as I understand it, this means that theano stores an array of the result of the Elemwise operation, and then computes the sum of the elements in that array. It would save a lot of time and memory bandwidth to compute the sum on the fly, possibly in parallel with an openmp reduce operation.
This notebook illustrates the problem: https://nbviewer.jupyter.org/gist/aseyboldt/777fc6e73f6444da14da03818352c96f I did not find a way to do this faster as it is, did I miss something? I thought about writing an op like SumElemwise (or more general CAReduceElemwise) and an optimization that changes `Elemwise -> Sum` into `SumElemwise`. I guess SumElemwise could be pretty similar to Elemwise, at least for cases where we sum over all dimensions. But while reading the code of Elemwise I realized that I really don't know enough about the internals of theano to pull this off myself. If anyone feels like spending time on this, they would have my eternal gratitude. ;-) -- Adrian -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.