Thinking of a different design: 1. Master python process builds and compiles all theano functions like normal (for GPU), and pickles them. 2. Worker processes initialize on other GPUs and unpickle all the functions. 3. User calls wrapped theano functions in master process, which signals to workers. 4. Workers run infinite loop, waiting for signal of what to do (some switch statement), e.g.: a. call some function (can take inputs from multiprocessing shared variables) and communicate result b. copy multiprocessing shared variables to update local theano GPU shared variables c. do collective GPU comms. d. etc.
The workers are "dumb" and never have to bother with any graphs. It's a bit of a pain to set up the multiprocessing shared variables (have to declare data sizes ahead of time) but not so bad. What I'm running into trouble with now is the theano shared variables. They get unpickled under the function's input_storage, but each function ends up with a separate set of objects here. I can manipulate them individually, but *is there a way to get multiple unpickled functions to refer to the same memory for corresponding shared variables?* (Simply setting the input_storage entries to another function's does not work.) -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.