Dear list, I have a python module written in C++ to help users manipulate a huge amount of genetics data. Using this module, users can write a script to create/load/manipulate data easily. For efficiency and memory management reasons, I would like to write a MPI version of the module so that I can spread the data to other machines.
I have some experience with MPI-1 so I started with the conventional design. That is to say, a fixed number of nodes are started and execute the same script. The data is split across nodes but all nodes can read/write data as if the data is local. That is to say, write operation is done on one of the nodes that has that piece of data, and results of read operation are broadcasted so that they appear to be local to all the nodes. The broadcast is needed to ensure identical execution logic of the script on all nodes. Although a test module is up and running, making sure all scripts *see* the same data and execute the same script has proven to be very inefficient and difficult. For example, if a script perform some action based on a local random number, different nodes would probably be out of sync. I am thinking of an implementation in which only the head node executes the script. It creates the slave nodes and asks them to act on their local data if needed. RMA can be used so that the head node can access data from slave nodes directly. This looks like an efficient solution but I am not sure how to instruct the slave nodes on what they should do. I mean, it is difficult to tell a slave node to execute a certain function with such and such parameters. Treating slave nodes as memory storage and use RMA for all the operations does not sound like a good idea either. I have been evaluating different approaches and have not decided which way to do. I would highly appreciate any advise on how to design and implement such a module. Many thanks in advance. Bo