> , right? (meaning that the number of tasks is determined by the number > of blocks of the input file).
Right. If you want to specify the number of tasks, you should have to adjust the size of block, or write the multiple files as you want. See the KMeansBSP.prepareInput() and prepareInputText() methods. -- Best Regards, Edward J. Yoon Chief Executive Officer DataSayer Co., Ltd. On Jul 28, 2014, at 5:31 PM, Giannis Giannakopoulos <[email protected]> wrote: > Hello everyone, > > I am trying to run the kmeans clustering algorithm from the hama > examples, but I face some problems. Specifically, I want to change the > number of BSP tasks launched, something that is not possible through > this > <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hama/hama-examples/0.6.2/org/apache/hama/examples/Kmeans.java> > , right? (meaning that the number of tasks is determined by the number > of blocks of the input file). > > To this end, I tried to use the KmeansBSP > <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hama/hama-ml/0.6.4/org/apache/hama/ml/kmeans/KMeansBSP.java#KMeansBSP.main%28java.lang.String[]%29> > job which exports as a parameter the number of launched tasks but I > can;t make it work :$. Specifically, I tried both text and sequence file > input formats but th job is always failing with the message > > "Cannot create <name of input>; already exists as a directory" > > When putting a non-existing dir, I get the same message. > > Can someone please guide me through this? I want to run KMeans and I > want to set the number of BSP tasks to launch (even if this means > partitioning the input file -- I haven't found anything about thuis > online regarding KMeans). > > Thank you in advance, > Giannis >
