Re: Cache'ing performance

2016-08-27 Thread Kazuaki Ishizaki
pull/11956. This PR shows room to performance improvement for float/double values that are not compressed. Kazuaki Ishizaki From: linguin@gmail.com To: Maciej Bry��ski Cc: Spark dev list Date: 2016/08/28 11:30 Subject: Re: Cache'ing performance Hi, How does

Re: Cache'ing performance

2016-08-27 Thread linguin . m . s
uaki Ishizaki > > > > From:Maciej Bryński > To: Spark dev list > Date:2016/08/28 05:40 > Subject:Cache'ing performance > > > > Hi, > I did some benchmark of cache function today. > > RDD > sc.parallelize(0 until

Re: Cache'ing performance

2016-08-27 Thread Kazuaki Ishizaki
these pull requests. Best Regards, Kazuaki Ishizaki From: Maciej Bryński To: Spark dev list Date: 2016/08/28 05:40 Subject: Cache'ing performance Hi, I did some benchmark of cache function today. RDD sc.parallelize(0 until Int.MaxValue).cache().count() Datas

Cache'ing performance

2016-08-27 Thread Maciej Bryński
Hi, I did some benchmark of cache function today. *RDD* sc.parallelize(0 until Int.MaxValue).cache().count() *Datasets* spark.range(Int.MaxValue).cache().count() For me Datasets was 2 times slower. Results (3 nodes, 20 cores and 48GB RAM each) *RDD - 6s* *Datasets - 13,5 s* Is that expected be