P.S.
If it helps, yes the indices are exactly the ones I have in the code.

Thanks
Oussama

On 16 February 2017 at 23:00, Oussama Souihli <albitru...@gmail.com> wrote:

> Hi Fred,
>
> Thank you for the quick reply and clever suggestions, you really hit the
> nail on the head !
>
> I tried the suggested vectorized code, but unfortunately still experienced
> the max recursion limit exception.
> I also tried the workaround, but still get the max recusion limit until I
> set it to 20K and I get a segmentation fault:
>
> Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
>
>
> Can you think of other suggestions that I could try ?
> If for-loop approach is doomed because of the max recursion issue, is
> there perhaps a way I could rewrite my code such that  scan becomes
> reasonably faster ?
>
>
>
> On 16 February 2017 at 22:22, Frédéric Bastien <frederic.bast...@gmail.com
> > wrote:
>
>> Is the indices exatly the one you have in your code? If so, you can
>> probably take advantage of this to vectorize the code and make the for loop
>> case work.
>>
>> The size are so small then using scan will be super slow. Each iteration
>> of scan have big overhead and you only work on scalar mostly at each
>> iteration. So it is normal that it is slow with scan.
>>
>> For the for loop, there was a problem of indices. I fixed it. Here is a
>> partial vectorization of the code that will make it faster to execute and
>> make a smaller graph:
>>
>>     coss = tt.cos(theta)
>>     sins = tt.sin(theta)
>>
>>     for k in range(0, K):
>>         A = tt.set_subtensor(A[indices[k, 0], indices[k, 0]], coss[k])
>>         A = tt.set_subtensor(A[indices[k, 1], indices[k, 1]], coss[k])
>>         A = tt.set_subtensor(A[indices[k, 0], indices[k, 1]], sins[k])
>>         A = tt.set_subtensor(A[indices[k, 1], indices[k, 0]], -sins[k])
>>         B = tt.dot(A, B)
>>
>> The call to cos and sin are only done once. This make the graph smaller.
>>
>> You can also do the indexing call only once. THis make the graph still
>> smaller:
>>
>>     coss = tt.cos(theta)
>>     sins = tt.cos(theta)
>>
>>     for k in range(0, K):
>>         idx0 = indices[k, 0]
>>         idx1 = indices[k, 1]
>>         A = tt.set_subtensor(A[idx0, idx0], coss[k])
>>         A = tt.set_subtensor(A[idx1, idx1], coss[k])
>>         A = tt.set_subtensor(A[idx0, idx1], sins[k])
>>         A = tt.set_subtensor(A[idx1, idx0], -sins[k])
>>         B = tt.dot(A, B)
>>
>> The if the indices is exactly what you created and don't change, it can
>> become that is even smaller:
>>
>>     for k in range(0, K):
>>         idx0 = K//(N-1)
>>         idx1 = indices[k, 1]
>>         A = tt.set_subtensor(A[idx0, idx0], coss[k])
>>         A = tt.set_subtensor(A[idx1, idx1], coss[k])
>>         A = tt.set_subtensor(A[idx0, idx1], sins[k])
>>         A = tt.set_subtensor(A[idx1, idx0], -sins[k])
>>         B = tt.dot(A, B)
>>
>> You can do similar with j.
>>
>> If with that, you still have the max recursion limit, try the work around
>> in this issue.
>>
>> https://github.com/Theano/Theano/issues/3607
>>
>> Fred
>>
>> On Thu, Feb 16, 2017 at 7:53 AM Oussama Souihli <albitru...@gmail.com>
>> wrote:
>>
>>> Hi Fred, Adam, Kiuhnm, all
>>>
>>> Thank you for the detailed and quick replies and multiple suggestions.
>>> I tried bacthed_dot but still experienced both issues (maximum depth
>>> recursion when using a for loop and very slow epochs when using
>>> theano.scan),
>>> so now I'm thinking the problem is also possibly  due to the way I
>>> populate the array A (using sub_tensor).
>>>
>>> Sorry for imposing on your kindness, but would you mind taking a look at
>>> the attached, *very short*, minimal examples reproducing what I'm
>>> trying to achieve ?
>>> The examples are self-contained and only require python with numpy and
>>> theano packages, nothing else.
>>>
>>>    - *Issue_with_for_loop.py*:
>>>    - Reproduces the for-loop maximum recursion issue.
>>>       - if you run it as is, it will throw the exception on B.eval()
>>>
>>>       - *Issue_with_scan.py*:
>>>    - This one shows the alternative way of implementing the logic using
>>>       scan.
>>>       - Unfortunately in this case it returns a result and gives the
>>>       impression like it works fine, but if I run it in Keras the epoch 
>>> duration
>>>       becomes prohibitively large (400,000 seconds per epoch !!)
>>>
>>>
>>> If you can think of a way that can achieve the logic implemented in
>>> either sheets with maximum efficiency theano-wise, I'd really appreciate
>>> it.
>>>
>>>
>>> With gratitude,
>>> Oussama
>>>
>>> --
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "theano-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to theano-users+unsubscr...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
>>
>> ---
>> You received this message because you are subscribed to a topic in the
>> Google Groups "theano-users" group.
>> To unsubscribe from this topic, visit https://groups.google.com/d/to
>> pic/theano-users/Yz8g-cejB8c/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> theano-users+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to theano-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to