Hallo,

I've been using theano through pymc3 a lot recently and came across a
hopefully realistic and useful optimization that theano does not seem
to do. Most theano graphs of models I've seen contain the combination
of an Elemwise Node and after that a Sum node. Often a lot of time
is spend in the Sum nodes (40% of the total run time in some of my real
world models). As far as I understand it, this means that theano stores
an array of the result of the Elemwise operation, and then computes the
sum of the elements in that array. It would save a lot of time and
memory bandwidth to compute the sum on the fly, possibly in parallel
with an openmp reduce operation.

This notebook illustrates the problem:
https://nbviewer.jupyter.org/gist/aseyboldt/777fc6e73f6444da14da03818352c96f

I did not find a way to do this faster as it is, did I miss something?

I thought about writing an op like SumElemwise (or more general
CAReduceElemwise) and an optimization that changes `Elemwise -> Sum`
into `SumElemwise`. I guess SumElemwise could be pretty similar to
Elemwise, at least for cases where we sum over all dimensions. But
while reading the code of Elemwise I realized that I really don't know
enough about the internals of theano to pull this off myself. If anyone
feels like spending time on this, they would have my eternal
gratitude. ;-)

-- Adrian

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to theano-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to