Can you explain something about the error and provide the stacktrace ?

On 12-09-2012 14:22, Don.Tan wrote:
The original data is here:

[hadoop@datamining ~]$ hadoop fs -ls /home/test/test
Found 1 items
-rw-r--r-- 1 hadoop supergroup 129213799 2012-09-12 15:45 /home/test/test/result

After I used "mahout seqdirectory -i /home/test/test/ -o /home/test/result/ -c UTF-8", get this:

[hadoop@datamining ~]$ hadoop fs -ls /home/test/result
Found 1 items
-rw-r--r-- 1 hadoop supergroup 129213898 2012-09-12 15:47 /home/test/result/chunk-0

And after "mahout seq2sparse -i /home/test/result -o /home/test/sparse":

[hadoop@datamining ~]$ hadoop fs -ls /home/test/sparse
Found 7 items
drwxr-xr-x - hadoop supergroup 0 2012-09-12 15:54 /home/test/sparse/df-count -rw-r--r-- 1 hadoop supergroup 442252 2012-09-12 15:53 /home/test/sparse/dictionary.file-0 -rw-r--r-- 1 hadoop supergroup 394853 2012-09-12 15:54 /home/test/sparse/frequency.file-0 drwxr-xr-x - hadoop supergroup 0 2012-09-12 15:53 /home/test/sparse/tf-vectors drwxr-xr-x - hadoop supergroup 0 2012-09-12 15:54 /home/test/sparse/tfidf-vectors drwxr-xr-x - hadoop supergroup 0 2012-09-12 15:53 /home/test/sparse/tokenized-documents drwxr-xr-x - hadoop supergroup 0 2012-09-12 15:53 /home/test/sparse/wordcount

Which should I do next? I used "mahout kmeans -i /home/test/sparse/ -o /home/test/kmeans -dm org.apache.mahout.common.distance.CosineDistanceMeasure -x 10 -k 20 -ow --clustering"
but I got error.....


On 09/12/2012 03:24 PM, Paritosh Ranjan wrote:
I think you will need these two commands ( in the same order ) :

seqdirectory : Generate sequence files (of Text) from a directory
seq2sparse: Sparse Vector generation from Text sequence files

On 12-09-2012 12:28, Don Tan wrote:
I think I didn't explain clear enough and sorry for that.

The example showed before is a part of my data.

Each line is a user profile, for example, the first row is the features of
a user. And I want to apply k-means to this data.

I need to create a file saves all users profile as sparse vector and put
them in mahout k-means algorithm, how can I do that?

  Thanks for your advice!

Don Tan

2012/9/12 Paritosh Ranjan <>

I could not understand the question correctly, can you explain more?
Here you can find how to use kmeans algorithm of Mahout**confluence/display/MAHOUT/K-**Means+Clustering<>

On 12-09-2012 11:43, Don.Tan wrote:


I am new to hadoop and mahout, but I have set up the hadoop cluster.

I am working on a clustering task lately. I think I could not make it quickly because I don't know too much about how to deal with massive data ( my data contains 1400000 user and 50000 that is sparse ).

    Could you tell me how deal with that? A slice of data is here:



     example above contains 4 user's data and each number is nominal
(denoting that is a kind of behavior of user, e.s, user 2 has
"98660","158620","33900" )

Please tell me how to work on that or which documents should I read..


    Don Tan

Reply via email to