Sorry for the inconvenience! Since Kmeans example allow only text file as a input, I think you have to create your own Kmeans job runner. Use KMeansBSP.prepareInput instead of prepareInputText.
Please see http://svn.apache.org/repos/asf/hama/trunk/examples/src/main/java/org/apache/hama/examples/Kmeans.java On Mon, Jul 28, 2014 at 9:33 PM, Giannis Giannakopoulos <[email protected]> wrote: > Ok then, how can I feed the KMeans job with multiple files as an input? > When trying to creating a dir and putting inside all my input files, the > job complains about the type of input (not a textfile) and exits.. Any > thoughts on this? > > > Thank you very much for your time, > Giannis > > On 07/28/2014 03:30 PM, Edward J. Yoon wrote: >>> , right? (meaning that the number of tasks is determined by the number >>> of blocks of the input file). >> Right. >> >> If you want to specify the number of tasks, you should have to adjust the >> size of block, or write the multiple files as you want. See the >> KMeansBSP.prepareInput() and prepareInputText() methods. >> >> -- >> Best Regards, Edward J. Yoon >> Chief Executive Officer >> DataSayer Co., Ltd. >> >> On Jul 28, 2014, at 5:31 PM, Giannis Giannakopoulos >> <[email protected]> wrote: >> >>> Hello everyone, >>> >>> I am trying to run the kmeans clustering algorithm from the hama >>> examples, but I face some problems. Specifically, I want to change the >>> number of BSP tasks launched, something that is not possible through >>> this >>> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hama/hama-examples/0.6.2/org/apache/hama/examples/Kmeans.java> >>> , right? (meaning that the number of tasks is determined by the number >>> of blocks of the input file). >>> >>> To this end, I tried to use the KmeansBSP >>> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hama/hama-ml/0.6.4/org/apache/hama/ml/kmeans/KMeansBSP.java#KMeansBSP.main%28java.lang.String[]%29> >>> job which exports as a parameter the number of launched tasks but I >>> can;t make it work :$. Specifically, I tried both text and sequence file >>> input formats but th job is always failing with the message >>> >>> "Cannot create <name of input>; already exists as a directory" >>> >>> When putting a non-existing dir, I get the same message. >>> >>> Can someone please guide me through this? I want to run KMeans and I >>> want to set the number of BSP tasks to launch (even if this means >>> partitioning the input file -- I haven't found anything about thuis >>> online regarding KMeans). >>> >>> Thank you in advance, >>> Giannis >>> >> > -- Best Regards, Edward J. Yoon CEO at DataSayer Co., Ltd.
