Hi Martin, Thank you for the support.
> Any chance to switch to Hama Pipes C++? I don’t think so. I have a Python code to make a distribute implementation from, and Python wrapper was the primary reason of choosing Hama. Roman On Mon, Oct 7, 2013 at 2:15 PM, Martin Illecker <[email protected]> wrote: > Hi Roman, > > sorry for the delay. > > So, the text protocol does not support it, or does it lack only in the >> Python wrapper? > > > Yes only the Python wrapper does not support it. > The Hama Pipes protocol (C++) does support custom partitioning. > > I could reproduce the exception, which is occurring during the protocol > shutdown only in Streaming API. > > 13/09/30 16:32:09 ERROR protocol.UplinkReader: >> java.lang.NullPointerException >> at org.apache.hama.pipes.protocol.UplinkReader.run( >> UplinkReader.java:127) > > > It was a problem in the Python wrapper and I fixed it [1] in my github > repository [2]. > > If I send any message of the length L, I receive the message with >> additional (L-1)/2 '^@' symbols after it. >> So, I use the following workaround. In the BSPPeer.getCurrentMessage(): >> return line[:len(line)-len(line)//3] >> instead of >> return line > > > I could not locate the cause [3] for your currentMessage length problem, so > I committed your workaround [4]. > > Then it seems the easiest way to work it >> around is to have the master thread resend those records to slaves... >> if they are not very big. >> > > Yes this would be an option but partitioning should be preferred. > Any chance to switch to Hama Pipes C++? > > Martin > > [1] > https://github.com/millecker/HamaStreaming/commit/95466287e883a892f85427303ff255bc52b00b9a > [2] https://github.com/millecker/HamaStreaming > [3] > https://github.com/apache/hama/blob/trunk/core/src/main/java/org/apache/hama/pipes/protocol/StreamingProtocol.java#L94 > [4] > https://github.com/millecker/HamaStreaming/commit/12ceb0d2db5d088004566592e42431ee40558fa7 > > > > 2013/10/1 Roman Shapovalov <[email protected]> > >> It seems that the file is lost in communication. Here is a copy: >> >> https://dl.dropboxusercontent.com/u/42489708/MasterSlaveBSP.py >> >> Roman >> >> >> On Tue, Oct 1, 2013 at 6:13 PM, Roman Shapovalov >> <[email protected]> wrote: >> > Hi Martin, >> > >> >> it seems you have forgotten the attachment. >> > >> > I can see one in the message I sent. Attaching again, try this. >> > >> > >> >> But currently the Hama Streaming API [2] does not support partitioning. >> > >> > So, the text protocol does not support it, or does it lack only in the >> > Python wrapper? >> > >> > So, the default partitioning is arbitrary, regardless of who is >> > reading and who is not? Then it seems the easiest way to work it >> > around is to have the master thread resend those records to slaves... >> > if they are not very big. >> > >> > Thanks, >> > Roman >> > >> > On Tue, Oct 1, 2013 at 9:30 AM, Martin Illecker <[email protected]> >> wrote: >> >> Hi Roman, >> >> >> >> it seems you have forgotten the attachment. (your code) >> >> >> >> ad 1) >> >> I would solve this by using a custom partitioner. >> >> A custom partitioner defines which records are distributed to which >> tasks. >> >> >> >> Here is some C++ partitioner example [1]. >> >> e.g., key 3,6,9 partitioner should return 1 >> >> and key 2,5,8 should return 2 >> >> >> >> But currently the Hama Streaming API [2] does not support partitioning. >> >> Only Hama Pipes C++ supports it. >> >> >> >> ad 2) >> >> Please submit your code, I will have a look at this exception. >> >> Or please submit the tasklog. >> >> >> >> Martin >> >> >> >> [1] >> >> >> https://github.com/apache/hama/blob/trunk/c%2B%2B/src/main/native/examples/impl/matrixmultiplication.cc#L131-138 >> >> [2] >> >> >> https://github.com/millecker/HamaStreaming/blob/1009bb1a6472d11f5dd3af9dc07fe64547dd0290/BinaryProtocol.py#L37-38 >> >> >> >> 2013/9/30 Roman Shapovalov <[email protected]> >> >> >> >>> Hello all, >> >>> >> >>> I am developing a toy master-slave application for the Python >> >>> streaming interface. There are two issues. >> >>> >> >>> 1. What is the semantics of the readNext command? >> >>> >> >>> If I run 3 tasks -- one of them is master who does not read input, -- >> >>> slaves take turn to read records, but each of them reads only each >> >>> third example, e.g. slave#1 reads records 3,6,9, while slave#2 reads >> >>> 2,5,8. So 1/3 of records are skipped, as if the master task would read >> >>> them. >> >>> >> >>> So, what is the exact semantics? Is there any best practice to make >> >>> each example read by some task (but not the master). >> >>> >> >>> >> >>> 2. After the code is executed (and the output is written), the job >> >>> fails. All the task logs contain the following text: >> >>> >> >>> 13/09/30 16:32:09 ERROR protocol.UplinkReader: >> >>> java.lang.NullPointerException >> >>> at >> >>> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:127) >> >>> >> >>> The exception is raised even if I don't use pipes at all. Since it >> >>> shows up after cleanup, it is not critical for the program, but it may >> >>> indicate some misuse by me or bugs in the Hama code. >> >>> >> >>> Please look at that issue. My code is attached. >> >>> >> >>> Roman >> >>> >>
