I am running spark-1.0.0 with java 1.8 "java version "1.8.0_05" Java(TM) SE Runtime Environment (build 1.8.0_05-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)"
"which spark-shell ~/bench/spark-1.0.0/bin/spark-shell" "which scala ~/bench/scala-2.10.4/bin/scala" On Thursday, July 10, 2014 12:46 PM, Tathagata Das <tathagata.das1...@gmail.com> wrote: I ran the SparkKMeans example (not the mllib KMeans that Sean ran) with your dataset as well, I got the expected answer. And I believe that even though initialization is done using sampling, the example actually sets the seed to a constant 42, so the result should always be the same no matter how many times you run it. So I am not really sure whats going on here. Can you tell us more about which version of Spark you are running? Which Java version? ====================================== [tdas @ Xion spark2] cat input 2 1 1 2 3 2 2 3 4 1 5 1 6 1 4 2 6 2 4 3 5 3 6 3 [tdas @ Xion spark2] ./bin/run-example SparkKMeans input 2 0.001 2014-07-10 02:45:06.764 java[45244:d17] Unable to load realm info from SCDynamicStore 14/07/10 02:45:07 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/10 02:45:07 WARN LoadSnappy: Snappy native library not loaded 14/07/10 02:45:08 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 14/07/10 02:45:08 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS Finished iteration (delta = 3.0) Finished iteration (delta = 0.0) Final centers: DenseVector(5.0, 2.0) DenseVector(2.0, 2.0) On Thu, Jul 10, 2014 at 2:17 AM, Wanda Hawk <wanda_haw...@yahoo.com> wrote: so this is what I am running: >"./bin/run-example SparkKMeans ~/Documents/2dim2.txt 2 0.001" > > >And this is the input file:" >┌───[spark2013@SparkOne]──────[~/spark-1.0.0].$ >└───#!cat ~/Documents/2dim2.txt >2 1 >1 2 >3 2 >2 3 >4 1 >5 1 >6 1 >4 2 >6 2 >4 3 >5 3 >6 3 >" > > >This is the final output from spark: >"14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator: >Getting 2 non-empty blocks out of 2 blocks >14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started >0 remote fetches in 0 ms >14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator: >maxBytesInFlight: 50331648, targetRequestSize: 10066329 >14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting >2 non-empty blocks out of 2 blocks >14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started >0 remote fetches in 0 ms >14/07/10 20:05:12 INFO Executor: Serialized size of result for 14 is 1433 >14/07/10 20:05:12 INFO Executor: Sending result for 14 directly to driver >14/07/10 20:05:12 INFO Executor: Finished task ID 14 >14/07/10 20:05:12 INFO DAGScheduler: Completed ResultTask(6, 0) >14/07/10 20:05:12 INFO TaskSetManager: Finished TID 14 in 5 ms on localhost >(progress: 1/2) >14/07/10 20:05:12 INFO Executor: Serialized size of result for 15 is 1433 >14/07/10 20:05:12 INFO Executor: Sending result for 15 directly to driver >14/07/10 20:05:12 INFO Executor: Finished task ID 15 >14/07/10 20:05:12 INFO DAGScheduler: Completed ResultTask(6, 1) >14/07/10 20:05:12 INFO TaskSetManager: Finished TID 15 in 7 ms on localhost >(progress: 2/2) >14/07/10 20:05:12 INFO DAGScheduler: Stage 6 (collectAsMap at >SparkKMeans.scala:75) finished in 0.008 s >14/07/10 20:05:12 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks >have all completed, from pool >14/07/10 20:05:12 INFO SparkContext: Job finished: collectAsMap at >SparkKMeans.scala:75, took 0.02472681 s >Finished iteration (delta = 0.0) >Final centers: >DenseVector(2.8571428571428568, 2.0) >DenseVector(5.6000000000000005, 2.0) >" > > > > > > > >On Thursday, July 10, 2014 12:02 PM, Bertrand Dechoux <decho...@gmail.com> >wrote: > > > >A picture is worth a thousand... Well, a picture with this dataset, what you >are expecting and what you get, would help answering your initial question. > > >Bertrand > > >On Thu, Jul 10, 2014 at 10:44 AM, Wanda Hawk <wanda_haw...@yahoo.com> wrote: > >Can someone please run the standard kMeans code on this input with 2 centers ?: >>2 1 >>1 2 >>3 2 >>2 3 >>4 1 >>5 1 >>6 1 >>4 2 >>6 2 >>4 3 >>5 3 >>6 3 >> >> >>The obvious result should be (2,2) and (5,2) ... (you can draw them if you >>don't believe me ...) >> >> >>Thanks, >>Wanda > > >