Fwd: KMeans calls takeSample() twice?

2016-08-30 Thread Georgios Samaras
-- Forwarded message -- From: Georgios Samaras Date: Tue, Aug 30, 2016 at 9:49 AM Subject: Re: KMeans calls takeSample() twice? To: "Sean Owen [via Apache Spark Developers List]" < ml-node+s1001551n18788...@n3.nabble.com> I am not sure what you want me to check.

Re: KMeans calls takeSample() twice?

2016-08-30 Thread Georgios Samaras
varam > > On Tue, Aug 30, 2016 at 9:50 AM, Georgios Samaras > wrote: > > > > ------ Forwarded message -- > > From: Georgios Samaras > > Date: Tue, Aug 30, 2016 at 9:49 AM > > Subject: Re: KMeans calls takeSample() twice? > > To: "Sean Owe

Re: KMeans calls takeSample() twice?

2016-08-31 Thread Georgios Samaras
efd3 >>> d0984dc6cac5d31e0/core/src/main/scala/org/apache/spark/rdd/ >>> RDD.scala#L508 >>> should explain when this happens. Also you can confirm this by >>> checking if the logWarning shows up in your logs. >>> >>> Thanks >>> Shi

Is Spark's KMeans unable to handle bigdata?

2016-09-01 Thread Georgios Samaras
Dear all, the random initialization works well, but the default initialization is k-means|| and has made me struggle. Also, I had heard people one year ago struggling with it too, and everybody would just skip it and use random, but I cannot keep it inside me! I have posted a minimal example

Re: Is Spark's KMeans unable to handle bigdata?

2016-09-02 Thread Georgios Samaras
cale. You aren't even > running out of memory it seems. Your memory settings are causing YARN > to kill the executors for using more memory than they advertise. That > could mean it never proceeds if this happens a lot. > > I don't have any problems with it. > > On

Re: Is Spark's KMeans unable to handle bigdata?

2016-09-02 Thread Georgios Samaras
> > > > I think this init may need some love and optimization. For example, I > > think treeAggregate might work better. An Array[Float] may be just > > fine and cut down memory usage, etc. > > > > On Fri, Sep 2, 2016 at 5:47 PM, Georgios Samaras > >

Re: Is Spark's KMeans unable to handle bigdata?

2016-09-03 Thread Georgios Samaras
gt; > On Fri, Sep 2, 2016 at 6:45 PM, Georgios Samaras > wrote: > > I am not using the "runs" parameter anyway, but I see your point. If you > > could point out any modifications in the minimal example I posted, I > would > > be more than interested to try them! > > >

Active tasks is a negative number spark ui

2016-09-04 Thread Georgios Samaras
Dear all, as discussed in this Stackoverflow question with a bounty , I was experiencing a similar situation to this: Does anybody have an idea on why this would happen? Best, George

Re: Active tasks is a negative number spark ui

2016-09-04 Thread Georgios Samaras
Will do Sean, thank you! On Sun, Sep 4, 2016 at 11:37 AM, Sean Owen wrote: > Search JIRA for several references. > > > On Sun, Sep 4, 2016, 18:52 Georgios Samaras > wrote: > >> Dear all, >> >> as discussed in this Stackoverflow question with a bounty &g