I'm trying to apply KMeans training to some text data, which consists of
lines that each contain something between 3 and 20 words. For that purpose,
all unique words are saved in a dictionary. This dictionary can become very
large as no hashing etc. is done, but it should spill to disk in case it
doesn't fit into memory anymore:
var dict = scala.collection.mutable.Map[String,Int]()
dict.persist(org.apache.spark.storage.StorageLevel.MEMORY_AND_DISK_SER)

With the help of this dictionary, I build sparse feature vectors for each
line which are then saved in an RDD that is used as input for KMeans.train.

Spark is running in standalone mode, in this case with 5 worker nodes.
It appears that anything up to the actual training completes successfully
with 126G of training data (logs below).

The training data is provided in form a cached, broadcasted variable to all
worker nodes:

var vectors2 =
vectors.repartition(1000).persist(org.apache.spark.storage.StorageLevel.MEMORY_AND_DISK_SER)
var broadcastVector = sc.broadcast(vectors2)
println("---------------------Start model training---------------------");
var model = KMeans.train(broadcastVector.value, 20, 10)

The first error I get is a null pointer exception, but there is still work
done after that. I think the real reason this terminates is
java.lang.OutOfMemoryError: Java heap space.

Is it possible that this happens because the cluster centers in the model
are represented in dense instead of sparse form, thereby getting large with
a large vector size? If yes, how can I make sure it doesn't crash because of
that? It should spill to disk if necessary.
My goal would be to have the input size only limited by disk space. Sure it
would get very slow if it spills to disk all the time, but it shouldn't
terminate.



Here's the console output from the model.train part:

---------------------Start model training---------------------
14/08/11 17:05:17 INFO spark.SparkContext: Starting job: takeSample at
KMeans.scala:263
14/08/11 17:05:17 INFO scheduler.DAGScheduler: Registering RDD 64
(repartition at <console>:48)
14/08/11 17:05:17 INFO scheduler.DAGScheduler: Got job 6 (takeSample at
KMeans.scala:263) with 1000 output partitions (allowLocal=false)
14/08/11 17:05:17 INFO scheduler.DAGScheduler: Final stage: Stage
8(takeSample at KMeans.scala:263)
14/08/11 17:05:17 INFO scheduler.DAGScheduler: Parents of final stage:
List(Stage 9)
14/08/11 17:05:17 INFO scheduler.DAGScheduler: Missing parents: List(Stage
9)
14/08/11 17:05:17 INFO scheduler.DAGScheduler: Submitting Stage 9
(MapPartitionsRDD[64] at repartition at <console>:48), which has no missing
parents
4116.323: [GC (Allocation Failure) [PSYoungGen: 1867168K->240876K(2461696K)]
4385155K->3164592K(9452544K), 1.4455064 secs] [Times: user=11.33 sys=0.03,
real=1.44 secs]
4174.512: [GC (Allocation Failure) [PSYoungGen: 1679497K->763168K(2338816K)]
4603212K->3691609K(9329664K), 0.8050508 secs] [Times: user=6.04 sys=0.01,
real=0.80 secs]
4188.250: [GC (Allocation Failure) [PSYoungGen: 2071822K->986136K(2383360K)]
5000263K->4487601K(9374208K), 1.6795174 secs] [Times: user=13.23 sys=0.01,
real=1.68 secs]
14/08/11 17:06:57 INFO scheduler.DAGScheduler: Submitting 1 missing tasks
from Stage 9 (MapPartitionsRDD[64] at repartition at <console>:48)
14/08/11 17:06:57 INFO scheduler.TaskSchedulerImpl: Adding task set 9.0 with
1 tasks
4190.947: [GC (Allocation Failure) [PSYoungGen: 2336718K->918720K(2276864K)]
5838183K->5406145K(9267712K), 1.5793066 secs] [Times: user=12.40 sys=0.02,
real=1.58 secs]
14/08/11 17:07:00 WARN scheduler.TaskSetManager: Stage 9 contains a task of
very large size (272484 KB). The maximum recommended task size is 100 KB.
14/08/11 17:07:00 INFO scheduler.TaskSetManager: Starting task 0.0 in stage
9.0 (TID 3053, idp11.foo.bar, PROCESS_LOCAL, 279023993 bytes)
4193.607: [GC (Allocation Failure) [PSYoungGen: 2070046K->599908K(2330112K)]
6557472K->5393557K(9320960K), 0.3267949 secs] [Times: user=2.53 sys=0.01,
real=0.33 secs]
4194.645: [GC (Allocation Failure) [PSYoungGen: 1516770K->589655K(2330112K)]
6310419K->5383352K(9320960K), 0.2566507 secs] [Times: user=1.96 sys=0.00,
real=0.26 secs]
4195.815: [GC (Allocation Failure) [PSYoungGen: 1730909K->275312K(2330112K)]
6524606K->5342865K(9320960K), 0.2053884 secs] [Times: user=1.57 sys=0.00,
real=0.21 secs]
14/08/11 17:08:56 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in
memory on idp11.foo.bar:46418 (size: 136.0 B, free: 10.4 GB)
14/08/11 17:08:56 INFO spark.MapOutputTrackerMasterActor: Asked to send map
output locations for shuffle 1 to sp...@idp11.foo.bar:57072
14/08/11 17:10:09 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 9.0
(TID 3053, idp11.foo.bar): java.lang.NullPointerException:
       
$line86.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:36)
       
$line86.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:36)
        scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        scala.collection.Iterator$class.foreach(Iterator.scala:727)
        scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
       
org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:57)
       
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:147)
       
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
        org.apache.spark.scheduler.Task.run(Task.scala:51)
       
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:189)
       
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
       
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        java.lang.Thread.run(Thread.java:745)
4382.710: [GC (Allocation Failure) [PSYoungGen: 1435334K->306688K(2333184K)]
6502887K->5374264K(9324032K), 0.1423619 secs] [Times: user=0.94 sys=0.01,
real=0.14 secs]
14/08/11 17:10:10 INFO scheduler.TaskSetManager: Starting task 0.1 in stage
9.0 (TID 3054, idp09.foo.bar, PROCESS_LOCAL, 279023993 bytes)
4383.842: [GC (Allocation Failure) [PSYoungGen: 1473219K->313540K(2330112K)]
6540795K->5381274K(9320960K), 0.1694822 secs] [Times: user=1.30 sys=0.01,
real=0.17 secs]
4384.836: [GC (Allocation Failure) [PSYoungGen: 1360342K->431799K(2448384K)]
6428075K->5499572K(9439232K), 0.2106620 secs] [Times: user=1.59 sys=0.00,
real=0.21 secs]
4386.083: [GC (Allocation Failure) [PSYoungGen: 1732982K->275312K(2381312K)]
6800755K->5616957K(9372160K), 0.2064240 secs] [Times: user=1.58 sys=0.00,
real=0.21 secs]
14/08/11 17:13:14 WARN storage.BlockManagerMasterActor: Removing
BlockManager BlockManagerId(1, idp09.foo.bar, 46815, 0) with no recent heart
beats: 81307ms exceeds 45000ms
14/08/11 17:13:35 INFO storage.BlockManagerMasterActor: Registering block
manager idp09.foo.bar:46815 with 10.4 GB RAM
14/08/11 17:13:35 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in
memory on idp09.foo.bar:46815 (size: 39.5 KB, free: 10.4 GB)
14/08/11 17:13:35 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in
memory on idp09.foo.bar:46815 (size: 39.5 KB, free: 10.4 GB)
14/08/11 17:13:35 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in
memory on idp09.foo.bar:46815 (size: 39.5 KB, free: 10.4 GB)
14/08/11 17:13:35 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in
memory on idp09.foo.bar:46815 (size: 39.5 KB, free: 10.4 GB)
14/08/11 17:13:43 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in
memory on idp09.foo.bar:46815 (size: 136.0 B, free: 10.4 GB)
14/08/11 17:13:43 INFO spark.MapOutputTrackerMasterActor: Asked to send map
output locations for shuffle 1 to sp...@idp09.foo.bar:45452
14/08/11 17:16:03 INFO scheduler.TaskSetManager: Finished task 0.1 in stage
9.0 (TID 3054) in 354311 ms on idp09.foo.bar (1/1)
14/08/11 17:16:03 INFO scheduler.DAGScheduler: Stage 9 (repartition at
<console>:48) finished in 546.308 s
14/08/11 17:16:03 INFO scheduler.DAGScheduler: looking for newly runnable
stages
14/08/11 17:16:03 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 9.0,
whose tasks have all completed, from pool
14/08/11 17:16:03 INFO scheduler.DAGScheduler: running: Set()
14/08/11 17:16:03 INFO scheduler.DAGScheduler: waiting: Set(Stage 8)
14/08/11 17:16:03 INFO scheduler.DAGScheduler: failed: Set()
14/08/11 17:16:03 INFO scheduler.DAGScheduler: Missing parents for Stage 8:
List()
14/08/11 17:16:03 INFO scheduler.DAGScheduler: Submitting Stage 8
(MappedRDD[71] at map at KMeans.scala:123), which is now runnable
4751.664: [GC (Allocation Failure) [PSYoungGen: 1603872K->118240K(2490368K)]
6945517K->5459924K(9481216K), 0.1854085 secs] [Times: user=1.33 sys=0.00,
real=0.19 secs]
4807.985: [GC (Allocation Failure) [PSYoungGen: 1595872K->492896K(2482176K)]
6937556K->5834920K(9473024K), 0.6883449 secs] [Times: user=5.36 sys=0.01,
real=0.69 secs]
4832.448: [GC (Allocation Failure) [PSYoungGen: 1716332K->895136K(2263552K)]
7058357K->6816776K(9254400K), 1.2636489 secs] [Times: user=9.90 sys=0.01,
real=1.27 secs]
14/08/11 17:17:41 INFO scheduler.DAGScheduler: Submitting 1000 missing tasks
from Stage 8 (MappedRDD[71] at map at KMeans.scala:123)
14/08/11 17:17:41 INFO scheduler.TaskSchedulerImpl: Adding task set 8.0 with
1000 tasks
4834.762: [GC (Allocation Failure) [PSYoungGen: 2128155K->885978K(2168320K)]
8049796K->7702659K(9159168K), 8.5102780 secs] [Times: user=38.78 sys=1.61,
real=8.51 secs]
4843.283: [Full GC (Ergonomics) [PSYoungGen: 885978K->0K(2168320K)]
[ParOldGen: 6816680K->2286524K(6990848K)] 7702659K->2286524K(9159168K),
[Metaspace: 81087K->81087K(1118208K)], 8.615370                                 
                
7 secs] [Times: user=63.32 sys=0.33, real=8.62 secs]
4852.799: [GC (Allocation Failure) [PSYoungGen: 1085341K->850420K(2330112K)]
3371865K->3136952K(9320960K), 0.3394825 secs] [Times: user=2.55 sys=0.02,
real=0.34 secs]
14/08/11 17:18:00 WARN scheduler.TaskSetManager: Stage 8 contains a task of
very large size (272490 KB). The maximum recommended task size is 100 KB.
14/08/11 17:18:00 INFO scheduler.TaskSetManager: Starting task 0.0 in stage
8.0 (TID 3055, idp09.foo.bar, PROCESS_LOCAL, 279029826 bytes)
4854.097: [GC (Allocation Failure) [PSYoungGen: 2006494K->545140K(2330112K)]
4293027K->3409458K(9320960K), 0.3943651 secs] [Times: user=3.04 sys=0.01,
real=0.40 secs]
14/08/11 17:18:01 INFO scheduler.TaskSetManager: Starting task 1.0 in stage
8.0 (TID 3056, idp19.foo.bar, PROCESS_LOCAL, 279029826 bytes)
4855.523: [GC (Allocation Failure) [PSYoungGen: 1703271K->882986K(2330112K)]
4567590K->4019826K(9320960K), 0.4778008 secs] [Times: user=3.69 sys=0.02,
real=0.48 secs]
14/08/11 17:18:03 INFO scheduler.TaskSetManager: Starting task 2.0 in stage
8.0 (TID 3057, idp11.foo.bar, PROCESS_LOCAL, 279029826 bytes)
4856.982: [GC (Allocation Failure) [PSYoungGen: 2005951K->577866K(2330112K)]
5142792K->3987245K(9320960K), 0.3770014 secs] [Times: user=2.89 sys=0.02,
real=0.38 secs]
14/08/11 17:18:05 INFO scheduler.TaskSetManager: Starting task 3.0 in stage
8.0 (TID 3058, idp41.foo.bar, PROCESS_LOCAL, 279029826 bytes)
4858.343: [GC (Allocation Failure) [PSYoungGen: 1738896K->310890K(2330112K)]
5148275K->3992807K(9320960K), 0.2853468 secs] [Times: user=2.17 sys=0.01,
real=0.28 secs]
4859.519: [GC (Allocation Failure) [PSYoungGen: 1429616K->272650K(2330112K)]
5111533K->4227121K(9320960K), 0.2705028 secs] [Times: user=2.09 sys=0.00,
real=0.27 secs]
14/08/11 17:18:06 INFO scheduler.TaskSetManager: Starting task 4.0 in stage
8.0 (TID 3059, idp42.foo.bar, PROCESS_LOCAL, 279029826 bytes)
4860.734: [GC (Allocation Failure) [PSYoungGen: 1429338K->545108K(2389504K)]
5383809K->4772109K(9380352K), 0.3282623 secs] [Times: user=2.53 sys=0.02,
real=0.33 secs]
14/08/11 17:18:08 INFO scheduler.TaskSetManager: Starting task 5.0 in stage
8.0 (TID 3060, idp09.foo.bar, PROCESS_LOCAL, 279029826 bytes)
4862.090: [GC (Allocation Failure) [PSYoungGen: 1701120K->883050K(2114560K)]
5928121K->5382589K(9105408K), 0.4179785 secs] [Times: user=3.15 sys=0.00,
real=0.41 secs]
14/08/11 17:18:09 INFO scheduler.TaskSetManager: Starting task 6.0 in stage
8.0 (TID 3061, idp19.foo.bar, PROCESS_LOCAL, 279029826 bytes)
4863.484: [GC (Allocation Failure) [PSYoungGen: 2006771K->577866K(2366976K)]
6506311K->5349943K(9357824K), 0.3806139 secs] [Times: user=2.92 sys=0.02,
real=0.38 secs]
14/08/11 17:18:11 INFO scheduler.TaskSetManager: Starting task 7.0 in stage
8.0 (TID 3062, idp11.foo.bar, PROCESS_LOCAL, 279029826 bytes)
4864.936: [GC (Allocation Failure) [PSYoungGen: 1777373K->349002K(2330112K)]
6549451K->5393633K(9320960K), 0.3118865 secs] [Times: user=2.36 sys=0.01,
real=0.31 secs]
4866.109: [GC (Allocation Failure) [PSYoungGen: 1428049K->272682K(2401280K)]
6472680K->5589859K(9392128K), 0.2053937 secs] [Times: user=1.58 sys=0.00,
real=0.20 secs]
14/08/11 17:18:13 INFO scheduler.TaskSetManager: Starting task 8.0 in stage
8.0 (TID 3063, idp41.foo.bar, PROCESS_LOCAL, 279029826 bytes)
4867.255: [GC (Allocation Failure) [PSYoungGen: 1428363K->545204K(2388992K)]
6745540K->6134903K(9379840K), 0.3292614 secs] [Times: user=2.52 sys=0.00,
real=0.33 secs]
14/08/11 17:18:14 INFO scheduler.TaskSetManager: Starting task 9.0 in stage
8.0 (TID 3064, idp42.foo.bar, PROCESS_LOCAL, 279029826 bytes)
4868.619: [GC (Allocation Failure) [PSYoungGen: 1700778K->883018K(2279424K)]
7290478K->6745255K(9270272K), 0.4138342 secs] [Times: user=3.20 sys=0.00,
real=0.41 secs]
14/08/11 17:18:16 INFO scheduler.TaskSetManager: Starting task 10.0 in stage
8.0 (TID 3065, idp09.foo.bar, PROCESS_LOCAL, 279029826 bytes)
4870.016: [GC (Allocation Failure) [PSYoungGen: 2005858K->577834K(2362880K)]
7868096K->6712625K(9353728K), 0.3216270 secs] [Times: user=2.48 sys=0.02,
real=0.33 secs]
14/08/11 17:18:18 INFO scheduler.TaskSetManager: Starting task 11.0 in stage
8.0 (TID 3066, idp19.foo.bar, PROCESS_LOCAL, 279029826 bytes)
4871.361: [GC (Allocation Failure) [PSYoungGen: 1777098K->349034K(2429440K)]
7911890K->6756372K(9420288K), 0.2425195 secs] [Times: user=1.86 sys=0.01,
real=0.24 secs]
4872.470: [GC (Allocation Failure) [PSYoungGen: 1428179K->272586K(2411008K)]
7835517K->6952462K(9401856K), 0.2090806 secs] [Times: user=1.60 sys=0.01,
real=0.21 secs]
4872.680: [Full GC (Ergonomics) [PSYoungGen: 272586K->0K(2411008K)]
[ParOldGen: 6679875K->5790843K(6990848K)] 6952462K->5790843K(9401856K),
[Metaspace: 81088K->81088K(1118208K)], 9.408670                                 
                
1 secs] [Times: user=70.70 sys=0.29, real=9.40 secs]
14/08/11 17:18:29 INFO scheduler.TaskSetManager: Starting task 12.0 in stage
8.0 (TID 3067, idp11.foo.bar, PROCESS_LOCAL, 279029826 bytes)
4883.028: [GC (Allocation Failure) [PSYoungGen: 1156929K->545236K(2479104K)]
6947773K->6336079K(9469952K), 0.2738816 secs] [Times: user=2.10 sys=0.00,
real=0.28 secs]
14/08/11 17:18:30 INFO scheduler.TaskSetManager: Starting task 13.0 in stage
8.0 (TID 3068, idp41.foo.bar, PROCESS_LOCAL, 279029826 bytes)
4884.347: [GC (Allocation Failure) [PSYoungGen: 1700618K->883018K(2306048K)]
7491461K->6946435K(9296896K), 0.4920853 secs] [Times: user=3.82 sys=0.01,
real=0.50 secs]
14/08/11 17:18:32 INFO scheduler.TaskSetManager: Starting task 14.0 in stage
8.0 (TID 3069, idp42.foo.bar, PROCESS_LOCAL, 279029826 bytes)
4885.818: [GC (Allocation Failure) [PSYoungGen: 2005731K->577898K(2436096K)]
8069149K->6913845K(9426944K), 0.3060761 secs] [Times: user=2.17 sys=0.02,
real=0.30 secs]
14/08/11 17:18:33 INFO scheduler.TaskSetManager: Starting task 15.0 in stage
8.0 (TID 3070, idp09.foo.bar, PROCESS_LOCAL, 279029826 bytes)
4887.211: [GC (Allocation Failure) [PSYoungGen: 1853207K->425322K(2391552K)]
8189155K->7033799K(9382400K), 0.3021801 secs] [Times: user=2.34 sys=0.01,
real=0.30 secs]
4887.513: [Full GC (Ergonomics) [PSYoungGen: 425322K->0K(2391552K)]
[ParOldGen: 6608477K->6684656K(6990848K)] 7033799K->6684656K(9382400K),
[Metaspace: 81096K->81032K(1118208K)], 9.489051                                 
                
5 secs] [Times: user=70.52 sys=0.34, real=9.49 secs]
14/08/11 17:18:44 INFO scheduler.TaskSetManager: Starting task 16.0 in stage
8.0 (TID 3071, idp19.foo.bar, PROCESS_LOCAL, 279029826 bytes)
4898.115: [Full GC (Ergonomics) [PSYoungGen: 1314547K->0K(2391552K)]
[ParOldGen: 6684656K->6899949K(6990848K)] 7999203K->6899949K(9382400K),
[Metaspace: 81032K->81025K(1118208K)], 11.0145                                  
               
761 secs] [Times: user=67.67 sys=0.88, real=11.02 secs]
4910.045: [Full GC (Ergonomics) [PSYoungGen: 1117462K->272491K(2391552K)]
[ParOldGen: 6899949K->6878697K(6990848K)] 8017411K->7151189K(9382400K),
[Metaspace: 81025K->81003K(1118208K)], 13                                       
          
.0508933 secs] [Times: user=96.11 sys=0.45, real=13.05 secs]
14/08/11 17:19:10 INFO scheduler.TaskSetManager: Starting task 17.0 in stage
8.0 (TID 3072, idp11.foo.bar, PROCESS_LOCAL, 279029826 bytes)
4923.867: [Full GC (Ergonomics) [PSYoungGen: 1157002K->577697K(2391552K)]
[ParOldGen: 6878697K->6878671K(6990848K)] 8035699K->7456368K(9382400K),
[Metaspace: 81003K->81003K(1118208K)], 11                                       
          
.8407076 secs] [Times: user=73.16 sys=0.35, real=11.84 secs]
4936.151: [Full GC (Ergonomics) [PSYoungGen: 1123485K->545009K(2391552K)]
[ParOldGen: 6878671K->6878671K(6990848K)] 8002156K->7423681K(9382400K),
[Metaspace: 81003K->81003K(1118208K)], 10                                       
          
.0288176 secs] [Times: user=75.19 sys=0.35, real=10.03 secs]
14/08/11 17:19:33 INFO scheduler.TaskSetManager: Starting task 18.0 in stage
8.0 (TID 3073, idp41.foo.bar, PROCESS_LOCAL, 279029826 bytes)
4946.717: [Full GC (Ergonomics) [PSYoungGen: 1122927K->697593K(2391552K)]
[ParOldGen: 6878671K->6878671K(6990848K)] 8001599K->7576264K(9382400K),
[Metaspace: 81003K->81003K(1118208K)], 8.                                       
          
4595299 secs] [Times: user=63.18 sys=0.26, real=8.45 secs]
4955.584: [Full GC (Ergonomics) [PSYoungGen: 1276308K->817527K(2391552K)]
[ParOldGen: 6878671K->6878670K(6990848K)] 8154980K->7696198K(9382400K),
[Metaspace: 81003K->81003K(1118208K)], 10                                       
          
.1614967 secs] [Times: user=76.43 sys=0.29, real=10.16 secs]
4966.013: [Full GC (Ergonomics) [PSYoungGen: 1090782K->817502K(2391552K)]
[ParOldGen: 6878670K->6878670K(6990848K)] 7969453K->7696173K(9382400K),
[Metaspace: 81003K->81003K(1118208K)], 10                                       
          
.6428199 secs] [Times: user=79.71 sys=0.35, real=10.64 secs]
14/08/11 17:20:03 INFO scheduler.TaskSetManager: Starting task 19.0 in stage
8.0 (TID 3074, idp42.foo.bar, PROCESS_LOCAL, 279029826 bytes)
4977.071: [Full GC (Ergonomics) [PSYoungGen: 1242847K->893797K(2391552K)]
[ParOldGen: 6878670K->6878670K(6990848K)] 8121517K->7772468K(9382400K),
[Metaspace: 81003K->81003K(1118208K)], 9.                                       
          
9548540 secs] [Times: user=74.76 sys=0.31, real=9.95 secs]
4987.156: [Full GC (Ergonomics) [PSYoungGen: 1047786K->970141K(2391552K)]
[ParOldGen: 6878670K->6878670K(6990848K)] 7926457K->7848811K(9382400K),
[Metaspace: 81003K->81003K(1118208K)], 8.                                       
          
4711455 secs] [Times: user=63.27 sys=0.33, real=8.47 secs]
4995.861: [Full GC (Ergonomics) [PSYoungGen: 1275597K->1122715K(2391552K)]
[ParOldGen: 6878670K->6878670K(6990848K)] 8154267K->8001385K(9382400K),
[Metaspace: 81003K->81003K(1118208K)], 1                                        
         
0.3113909 secs] [Times: user=76.20 sys=0.31, real=10.31 secs]
5006.173: [Full GC (Allocation Failure) [PSYoungGen:
1122715K->1122715K(2391552K)] [ParOldGen: 6878670K->6876589K(6990848K)]
8001385K->7999305K(9382400K), [Metaspace: 81003K->79986K(11182                  
                               
08K)], 12.8222611 secs] [Times: user=94.71 sys=0.43, real=12.82 secs]
5019.191: [Full GC (Ergonomics) [PSYoungGen: 1278710K->0K(2391552K)]
[ParOldGen: 6876589K->2320712K(6990848K)] 8155299K->2320712K(9382400K),
[Metaspace: 80014K->80014K(1118208K)], 8.33951                                  
               
79 secs] [Times: user=62.12 sys=0.28, real=8.34 secs]
14/08/11 17:20:45 ERROR actor.ActorSystemImpl: Uncaught fatal error from
thread [spark-akka.actor.default-dispatcher-18] shutting down ActorSystem
[spark]
java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3230)
        at
java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178)
        at
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)
        at
org.apache.spark.scheduler.Task$.serializeWithDependencies(Task.scala:132)
        at
org.apache.spark.scheduler.TaskSetManager.resourceOffer(TaskSetManager.scala:419)
        at
org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$3$$anonfun$apply$7$$anonfun$apply$2.apply$mcVI$sp(TaskSchedulerImpl.scala:257)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
        at
org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$3$$anonfun$apply$7.apply(TaskSchedulerImpl.scala:253)
        at
org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$3$$anonfun$apply$7.apply(TaskSchedulerImpl.scala:250)
        at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
        at
org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$3.apply(TaskSchedulerImpl.scala:250)
        at
org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$3.apply(TaskSchedulerImpl.scala:250)
        at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at
org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:250)
        at
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor.makeOffers(CoarseGrainedSchedulerBackend.scala:153)
        at
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:120)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
14/08/11 17:20:54 INFO scheduler.DAGScheduler: Failed to run takeSample at
KMeans.scala:263
14/08/11 17:20:55 INFO scheduler.TaskSetManager: Starting task 21.0 in stage
8.0 (TID 3076, idp09.foo.bar, PROCESS_LOCAL, 279029826 bytes)
5028.749: [GC (Allocation Failure) [PSYoungGen: 1327230K->889140K(2304512K)]
3647943K->3209852K(9295360K), 0.3623403 secs] [Times: user=2.65 sys=0.02,
real=0.37 secs]
org.apache.spark.SparkException: Job cancelled because SparkContext was shut
down
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:608)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:607)
        at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
        at
org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:607)
        at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor.postStop(DAGScheduler.scala:1203)
        at
akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:201)
        at
akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:163)
        at akka.actor.ActorCell.terminate(ActorCell.scala:338)
        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:431)
        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447)
        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262)
        at akka.dispatch.Mailbox.run(Mailbox.scala:218)
        at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Using-very-large-files-for-KMeans-training-cluster-centers-size-tp11937.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to