Re: Building spark 1.2 from source requires more dependencies

2015-03-30 Thread yash datta
Hi all, When selecting large data in sparksql (Select * query) , I see Buffer overflow exception from kryo : 15/03/27 10:32:19 WARN scheduler.TaskSetManager: Lost task 6.0 in stage 3.0 (TID 30, machine159): com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 1, required: 2

[sql] How to uniquely identify Dataframe?

2015-03-30 Thread Peter Rudenko
Hi i have some custom caching logic in my application. I need to identify somehow Dataframe, to check whether i saw it previously. Here’s a problem: |scala val data = sc.parallelize(1 to 1000) data: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at console:21 scala

Re: [sql] How to uniquely identify Dataframe?

2015-03-30 Thread Reynold Xin
The only reason I can think of right now is that you might want to change the config parameter to change the behavior of the optimizer and regenerate the plan. However, maybe that's not a strong enough reasons to regenerate the RDD everytime. On Mon, Mar 30, 2015 at 5:38 AM, Cheng Lian

Re: mllib.recommendation Design

2015-03-30 Thread Xiangrui Meng
On Wed, Mar 25, 2015 at 7:59 AM, Debasish Das debasish.da...@gmail.com wrote: Hi Xiangrui, I am facing some minor issues in implementing Alternating Nonlinear Minimization as documented in this JIRA due to the ALS code being in ml package: https://issues.apache.org/jira/browse/SPARK-6323 I

Re: Using CUDA within Spark / boosting linear algebra

2015-03-30 Thread Xiangrui Meng
Hi Alex, Since it is non-trivial to make nvblas work with netlib-java, it would be great if you can send the instructions to netlib-java as part of the README. Hopefully we don't need to modify netlib-java code to use nvblas. Best, Xiangrui On Thu, Mar 26, 2015 at 9:54 AM, Sean Owen

Problems with cleanup throughout code base

2015-03-30 Thread Ganelin, Ilya
Hi all, when looking into a fix for a deadlock in the SparkContext shutdown code for https://issues.apache.org/jira/browse/SPARK-6492, I noticed that the “isStopped” flag is set to true before executing the actual shutdown code. This is a problem since it means that if the shutdown sequence

Stochastic gradient descent performance

2015-03-30 Thread Ulanov, Alexander
Hi, It seems to me that there is an overhead in runMiniBatchSGD function of MLlib's GradientDescent. In particular, sample and treeAggregate might take time that is order of magnitude greater than the actual gradient computation. In particular, for mnist dataset of 60K instances, minibatch

RE: Using CUDA within Spark / boosting linear algebra

2015-03-30 Thread Ulanov, Alexander
Hi Sam, What is the best way to do it? Should I clone netlib-java, edit readme.md and make a PR? Best regards, Alexander -Original Message- From: Xiangrui Meng [mailto:men...@gmail.com] Sent: Monday, March 30, 2015 2:43 PM To: Sean Owen Cc: Evan R. Sparks; Sam Halliday;

Re: mllib.recommendation Design

2015-03-30 Thread Debasish Das
For alm I have started experimenting with the following: 1. rmse and map improvement from loglikelihood loss over least square loss. 2. Factorization for datasets that are not ratings (basically improvement over implicit ratings) 3. Sparse topic generation using plsa. We are directly optimizing

How to get removed RDD from windows?

2015-03-30 Thread wyphao.2007
I want to get removed RDD from windows as follow, The old RDDs will removed from current window, // _ // | previous window _|___ // |___| current window| -- Time //