Hi Roman, it seems you have forgotten the attachment. (your code)
ad 1) I would solve this by using a custom partitioner. A custom partitioner defines which records are distributed to which tasks. Here is some C++ partitioner example [1]. e.g., key 3,6,9 partitioner should return 1 and key 2,5,8 should return 2 But currently the Hama Streaming API [2] does not support partitioning. Only Hama Pipes C++ supports it. ad 2) Please submit your code, I will have a look at this exception. Or please submit the tasklog. Martin [1] https://github.com/apache/hama/blob/trunk/c%2B%2B/src/main/native/examples/impl/matrixmultiplication.cc#L131-138 [2] https://github.com/millecker/HamaStreaming/blob/1009bb1a6472d11f5dd3af9dc07fe64547dd0290/BinaryProtocol.py#L37-38 2013/9/30 Roman Shapovalov <[email protected]> > Hello all, > > I am developing a toy master-slave application for the Python > streaming interface. There are two issues. > > 1. What is the semantics of the readNext command? > > If I run 3 tasks -- one of them is master who does not read input, -- > slaves take turn to read records, but each of them reads only each > third example, e.g. slave#1 reads records 3,6,9, while slave#2 reads > 2,5,8. So 1/3 of records are skipped, as if the master task would read > them. > > So, what is the exact semantics? Is there any best practice to make > each example read by some task (but not the master). > > > 2. After the code is executed (and the output is written), the job > fails. All the task logs contain the following text: > > 13/09/30 16:32:09 ERROR protocol.UplinkReader: > java.lang.NullPointerException > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:127) > > The exception is raised even if I don't use pipes at all. Since it > shows up after cleanup, it is not critical for the program, but it may > indicate some misuse by me or bugs in the Hama code. > > Please look at that issue. My code is attached. > > Roman >
