Sorry for the inconvenience!

Since Kmeans example allow only text file as a input, I think you have
to create your own Kmeans job runner. Use KMeansBSP.prepareInput
instead of prepareInputText.

Please see 
http://svn.apache.org/repos/asf/hama/trunk/examples/src/main/java/org/apache/hama/examples/Kmeans.java

On Mon, Jul 28, 2014 at 9:33 PM, Giannis Giannakopoulos
<[email protected]> wrote:
> Ok then, how can I feed the KMeans job with multiple files as an input?
> When trying to creating a dir and putting inside all my input files, the
> job complains about the type of input (not a textfile) and exits.. Any
> thoughts on this?
>
>
> Thank you very much for your time,
> Giannis
>
> On 07/28/2014 03:30 PM, Edward J. Yoon wrote:
>>> , right? (meaning that the number of tasks is determined by the number
>>> of blocks of the input file).
>> Right.
>>
>> If you want to specify the number of tasks, you should have to adjust the 
>> size of block, or write the multiple files as you want. See the 
>> KMeansBSP.prepareInput() and prepareInputText() methods.
>>
>> --
>> Best Regards, Edward J. Yoon
>> Chief Executive Officer
>> DataSayer Co., Ltd.
>>
>> On Jul 28, 2014, at 5:31 PM, Giannis Giannakopoulos 
>> <[email protected]> wrote:
>>
>>> Hello everyone,
>>>
>>> I am trying to run the kmeans clustering algorithm from the hama
>>> examples, but I face some problems. Specifically, I want to change the
>>> number of BSP tasks launched, something that is not possible through
>>> this
>>> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hama/hama-examples/0.6.2/org/apache/hama/examples/Kmeans.java>
>>> , right? (meaning that the number of tasks is determined by the number
>>> of blocks of the input file).
>>>
>>> To this end, I tried to use the KmeansBSP
>>> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hama/hama-ml/0.6.4/org/apache/hama/ml/kmeans/KMeansBSP.java#KMeansBSP.main%28java.lang.String[]%29>
>>> job which exports as a parameter the number of launched tasks but I
>>> can;t make it work :$. Specifically, I tried both text and sequence file
>>> input formats but th job is always failing with the message
>>>
>>> "Cannot create <name of input>; already exists as a directory"
>>>
>>> When putting a non-existing dir, I get the same message.
>>>
>>> Can someone please guide me through this? I want to run KMeans and I
>>> want to set the number of BSP tasks to launch (even if this means
>>> partitioning the input file -- I haven't found anything about thuis
>>> online regarding KMeans).
>>>
>>> Thank you in advance,
>>> Giannis
>>>
>>
>



-- 
Best Regards, Edward J. Yoon
CEO at DataSayer Co., Ltd.

Reply via email to