Re: renaming SchemaRDD - DataFrame

2015-01-27 Thread Dmitriy Lyubimov
It has been pretty evident for some time that's what it is, hasn't it? Yes that's a better name IMO. On Mon, Jan 26, 2015 at 2:18 PM, Reynold Xin r...@databricks.com wrote: Hi, We are considering renaming SchemaRDD - DataFrame in 1.3, and wanted to get the community's opinion. The context

Re: Unit test best practice for Spark-derived projects

2014-08-07 Thread Dmitriy Lyubimov
Thanks. let me check this hypothesis (i have dhcp connection on a private net but consequently not sure if there's an inverse). On Thu, Aug 7, 2014 at 10:29 AM, Madhu ma...@madhu.com wrote: How long does it take to get a spark context? I found that if you don't have a network connection

Unit test best practice for Spark-derived projects

2014-08-05 Thread Dmitriy Lyubimov
Hello, I 've been switching Mahout from Spark 0.9 to Spark 1.0.x [1] and noticed that tests now run much slower compared to 0.9 with CPU running idle most of the time. I had to conclude that most of that time is spent on tearing down/resetting Spark context which apparently now takes

log overloaded in SparkContext/ Spark 1.0.x

2014-08-04 Thread Dmitriy Lyubimov
it would seem the code like import o.a.spark.SparkContext._ import math._ a = log(b) does not seem to compile anymore with Spark 1.0.x since SparkContext._ also exposes a `log` function. Which happens a lot to a guy like me. obvious workaround is to use something like import

Re: Contributing to MLlib: Proposal for Clustering Algorithms

2014-07-08 Thread Dmitriy Lyubimov
Hector, could you share the references for hierarchical K-means? thanks. On Tue, Jul 8, 2014 at 1:01 PM, Hector Yee hector@gmail.com wrote: I would say for bigdata applications the most useful would be hierarchical k-means with back tracking and the ability to support k nearest centroids.

Re: Contributing to MLlib: Proposal for Clustering Algorithms

2014-07-08 Thread Dmitriy Lyubimov
especially over billions of users. On Tue, Jul 8, 2014 at 1:24 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Hector, could you share the references for hierarchical K-means? thanks. On Tue, Jul 8, 2014 at 1:01 PM, Hector Yee hector@gmail.com wrote: I would say for bigdata

Re: Kryo not default?

2014-05-13 Thread Dmitriy Lyubimov
On Mon, May 12, 2014 at 2:47 PM, Anand Avati av...@gluster.org wrote: Hi, Can someone share the reason why Kryo serializer is not the default? why should it be? On top of it, the only way to serialize a closure into the backend (even now) is java serialization (which means java serialization