Output of cluster dumper of Kmeans with colon

2014-03-16 Thread Bikash Gupta
Hi, Sometime I get output of cluster dumper of Kmeans with colon 0~~~0~~~VL-0{n=147408 c=[1032.927, 17.964, 11.384, 11.384] r=[10245.867, 761.066, 62.758, 62.758]} 1~~~1~~~VL-1{n=6 c=[0:2859913.130, 1:561.007] r=[0:366747.921, 1:1189.343]} 2~~~2~~~VL-2{n=3 c=[5335512.995, 96.320, 4.709, 4.709]

Re: Problem with K-Means clustering on Amazon EMR

2014-03-16 Thread Frank Scholten
Hi Konstantin, Good to hear from you. The link you mentioned points to EigenSeedGenerator not RandomSeedGenerator. The problem seems to be with the call to fs.getFileStatus(input).isDir() It's been a while and I don't remember but perhaps you have to set additional Hadoop fs properties to use

Re: Mahout with Storm/Spark

2014-03-16 Thread Suneel Marthi
Hi Peyman, good to hear from u.  Not sure if anyone's responded to u yet, but the answer to ur question is I am not aware of any bench marking that was done for #Mahout's CVB impl. Others please jump in here if you think otherwise. What has changed in LDA from 0.7 - 0.9?   - 0.7 had LDA

Re: Problem with K-Means clustering on Amazon EMR

2014-03-16 Thread Jay Vyas
I specifically have fixed mapreduce jobs by doing what the error message suggests. But maybe (hopefully) there is another workaround that is configuration driven. Just a hunch but, Maybe mahout needs to be refactored to create fs objects using the get(uri,conf) calls? As hadoop evolves to

Re: Problem with K-Means clustering on Amazon EMR

2014-03-16 Thread Andrew Musselman
Another wild guess, I've had issues trying to use the 's3' protocol from Hadoop and got things working by using the 's3n' protocol instead. On Mar 16, 2014, at 8:41 AM, Jay Vyas jayunit...@gmail.com wrote: I specifically have fixed mapreduce jobs by doing what the error message suggests.

Re: Problem with K-Means clustering on Amazon EMR

2014-03-16 Thread Sebastian Schelter
I've also encountered a similar error once. It's really just the FileSystem.get call that needs to be modified. I think its a good idea to walk through the codebase and refactor this where necessary. --sebastian On 03/16/2014 05:16 PM, Andrew Musselman wrote: Another wild guess, I've had

Re: Problem with K-Means clustering on Amazon EMR

2014-03-16 Thread Jay Vyas
I agree best to be explicit when creating filesystem instances by using the two argument get(...). it's time to update it filesystem 2.0 Apis. Can you file a Jira for this ? If not I will :) On Mar 16, 2014, at 12:37 PM, Sebastian Schelter s...@apache.org wrote: I've also encountered a

Re: debug mode

2014-03-16 Thread Andrew Musselman
Yes, there are ways to debug. One is to attach a remote debugger, set breakpoints, etc., like so: https://www.google.com/search?q=attach+remote+debugger+java+example+eclipse+or+intellij The other would be to write a log4j.properties file for Mahout and/or Hadoop and set the logging level to more