I would like to elaborate a bit on the *reach* variable cause it might also be related to the issue.
I initialise *reach *variable as a numpy array of ones outputs[case] = CreateTree('', case, np.array(1.0, dtype=np.float32), utils, cases) Then *reach *variable is passed recursively down def CreateTree(node, group, reach, util, cases): where at each step it gets multiplied by *probs* new_reach = reach * probs[idx] and *probs* is derived from shared variables. Basically reach variable just is a cumulative product which is initialised with 1. I was not sure how to initialize with 1, so just used numpy array for that. Could this approach be at fault? On Wednesday, 22 February 2017 08:16:20 UTC+1, Šarūnas S. wrote: > > Sorry for late reply. Thanks for having a look into that. > > By type do you mean dimensions or what? > cases is a square matrix (eg. with shape (32,32) stored as a shared > variable > reach is a column vector (eg. with shape (32,1)) which is a resulg of the > graph creation where a numpy vector is multiplied with tensors. > > I am slighly unsure but how would you do this with broadcasting? I need to > multiply each row of cases with reach column. > > > On Tuesday, 21 February 2017 23:44:42 UTC+1, nouiz wrote: >> >> I discussed this with @lamblin. We could do an optimization to fix this, >> but it would be a very narrow special case. We won't do it in the short >> term. But you can manually do it yourself. Instead of calling tile, you can >> reshape cases[group] and reach to 3d tensor with the right dimensions set >> as broadcastable. This would allow you to do what you want efficently >> without having alloc in the graph. This is a very good use of broadcasting. >> >> Frédéric >> >> On Wed, Feb 15, 2017 at 12:16 PM Frédéric Bastien <frederic...@gmail.com> >> wrote: >> >>> tile generate alloc. To help you about the broadcasting I need more >>> information. >>> >>> what is: >>> cases.type? >>> reach.type? >>> >>> Fred >>> On Tue, Feb 7, 2017 at 4:51 PM Frédéric Bastien <frederic...@gmail.com> >>> wrote: >>> >>>> There is a high quantity of GpuAlloc. What you have shown don't tell us >>>> what need it in Theano. Can you run the theano function with profiling, >>>> and >>>> before the script end call theano.debugprint(your_theano_function) and >>>> send >>>> this output? It will tell us what need it in the graph. >>>> >>>> On Fri, Feb 3, 2017 at 4:22 AM Šarūnas S. <shar...@gmail.com> wrote: >>>> >>>>> I wrote a script in theano and started profiling it. What I noticed is >>>>> GPU spends most of the time in GpuAlloc . >>>>> >>>>> Could somebody explain me why this is happening and how I could reduce >>>>> it? >>>>> In C or C++ I would preallocate it, but not sure how to do this in >>>>> theano. >>>>> >>>>> I am running on Windows 8.1 with Nvidia GTX 1070 with Theano >>>>> @ 0.9.0dev4.dev-3c0be3d94102ac6864b2e5ab52ae96d07c6375c6 >>>>> >>>>> >>>>> I am attaching extensive profile result below: >>>>> >>>>> Function profiling >>>>> ================== >>>>> Message: Sum of all(2) printed profiles at exit excluding Scan op >>>>> profile. >>>>> Time in 200 calls to Function.__call__: 3.463001e+00s >>>>> Time in Function.fn.__call__: 3.451001e+00s (99.653%) >>>>> Time in thunks: 3.425293e+00s (98.911%) >>>>> Total compile time: 1.413800e+01s >>>>> Number of Apply nodes: 590 >>>>> Theano Optimizer time: 1.158200e+01s >>>>> Theano validate time: 9.390018e-01s >>>>> Theano Linker time (includes C, CUDA code generation/compiling): >>>>> 2.107000e+00s >>>>> Import time 3.500128e-02s >>>>> Node make_thunk time 2.042000e+00s >>>>> Node GpuCAReduce{add}{0,1}(GpuElemwise{Composite{(i0 * (i1 >>>>> * i2))}}[(0, 2)].0) time 9.000063e-03s >>>>> Node GpuCAReduce{add}{0,1}(GpuElemwise{Mul}[(0, 1)].0) >>>>> time 7.999897e-03s >>>>> Node GpuDimShuffle{0,x}(GpuCAReduce{add}{0,1}.0) time >>>>> 6.999969e-03s >>>>> Node Shape_i{1}(<CudaNdarrayType(float32, matrix)>) time >>>>> 4.999876e-03s >>>>> Node GpuElemwise{Mul}[(0, 1)](CudaNdarrayConstant{[[ 240. >>>>> ]]}, GpuDimShuffle{0,x}.0) time 4.999876e-03s >>>>> >>>>> >>>>> Time in all call to theano.grad() 0.000000e+00s >>>>> Time since theano import 41.580s >>>>> Class >>>>> --- >>>>> <% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> >>>>> <Class name> >>>>> 90.5% 90.5% 3.100s 3.37e-04s C 9200 92 >>>>> theano.sandbox.cuda.basic_ops.GpuAlloc >>>>> 7.4% 97.9% 0.254s 4.19e-06s C 60600 606 >>>>> theano.sandbox.cuda.basic_ops.GpuElemwise >>>>> 1.0% 98.9% 0.034s 2.77e-06s C 12200 122 >>>>> theano.sandbox.cuda.basic_ops.GpuCAReduce >>>>> 0.5% 99.4% 0.017s 1.84e-06s C 9200 92 >>>>> theano.sandbox.cuda.basic_ops.GpuReshape >>>>> 0.5% 99.9% 0.016s 7.45e-07s C 21400 214 >>>>> theano.sandbox.cuda.basic_ops.GpuDimShuffle >>>>> 0.1% 99.9% 0.003s 1.57e-06s C 1900 19 >>>>> theano.tensor.elemwise.Elemwise >>>>> 0.1% 100.0% 0.002s 5.24e-07s C 3800 38 >>>>> theano.compile.ops.Shape_i >>>>> 0.0% 100.0% 0.000s 0.00e+00s C 1900 19 >>>>> theano.tensor.opt.MakeVector >>>>> ... (remaining 0 Classes account for 0.00%(0.00s) of the runtime) >>>>> >>>>> >>>>> Ops >>>>> --- >>>>> <% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> >>>>> <Op name> >>>>> 90.5% 90.5% 3.100s 3.37e-04s C 9200 92 >>>>> GpuAlloc >>>>> 1.7% 92.2% 0.058s 4.41e-06s C 13100 131 >>>>> GpuElemwise{Mul}[(0, 1)] >>>>> 1.0% 93.2% 0.034s 3.21e-06s C 10600 106 >>>>> GpuElemwise{maximum,no_inplace} >>>>> 1.0% 94.2% 0.034s 2.77e-06s C 12200 122 >>>>> GpuCAReduce{add}{0,1} >>>>> 0.7% 94.8% 0.023s 3.54e-06s C 6500 65 >>>>> GpuElemwise{Composite{maximum(((i0 + i1) - i2), i3)}}[(0, 0)] >>>>> 0.5% 95.4% 0.018s 3.27e-06s C 5500 55 >>>>> GpuElemwise{mul,no_inplace} >>>>> 0.5% 95.9% 0.018s 4.61e-06s C 3900 39 >>>>> GpuElemwise{Composite{((i0 * i1) / i2)}}[(0, 1)] >>>>> 0.5% 96.4% 0.017s 1.84e-06s C 9200 92 >>>>> GpuReshape{2} >>>>> 0.4% 96.8% 0.014s 4.33e-06s C 3200 32 >>>>> GpuElemwise{Composite{(i0 * (i1 * i2))}}[(0, 2)] >>>>> 0.2% 97.0% 0.008s 8.69e-07s C 9200 92 >>>>> GpuDimShuffle{1,0} >>>>> 0.2% 97.3% 0.008s 5.33e-06s C 1500 15 >>>>> GpuElemwise{Composite{((i0 * i1) / i2)},no_inplace} >>>>> 0.2% 97.5% 0.008s 6.52e-07s C 12200 122 >>>>> GpuDimShuffle{0,x} >>>>> 0.2% 97.7% 0.007s 4.38e-06s C 1600 16 >>>>> GpuElemwise{Composite{(((i0 * i1 * maximum(i2, i3)) / (maximum(i2, >>>>> i3) + maximum(i4, i3))) + ((i5 * i6 * maximum(i4, i3 >>>>> >>>> -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.