Hi all,
When selecting large data in sparksql (Select * query) , I see Buffer
overflow exception from kryo :
15/03/27 10:32:19 WARN scheduler.TaskSetManager: Lost task 6.0 in stage 3.0
(TID 30, machine159): com.esotericsoftware.kryo.KryoException: Buffer
overflow. Available: 1, required: 2
Hi i have some custom caching logic in my application. I need to
identify somehow Dataframe, to check whether i saw it previously. Here’s
a problem:
|scala val data = sc.parallelize(1 to 1000) data:
org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize
at console:21 scala
The only reason I can think of right now is that you might want to change
the config parameter to change the behavior of the optimizer and regenerate
the plan. However, maybe that's not a strong enough reasons to regenerate
the RDD everytime.
On Mon, Mar 30, 2015 at 5:38 AM, Cheng Lian
On Wed, Mar 25, 2015 at 7:59 AM, Debasish Das debasish.da...@gmail.com wrote:
Hi Xiangrui,
I am facing some minor issues in implementing Alternating Nonlinear
Minimization as documented in this JIRA due to the ALS code being in ml
package: https://issues.apache.org/jira/browse/SPARK-6323
I
Hi Alex,
Since it is non-trivial to make nvblas work with netlib-java, it would
be great if you can send the instructions to netlib-java as part of
the README. Hopefully we don't need to modify netlib-java code to use
nvblas.
Best,
Xiangrui
On Thu, Mar 26, 2015 at 9:54 AM, Sean Owen
Hi all, when looking into a fix for a deadlock in the SparkContext shutdown
code for https://issues.apache.org/jira/browse/SPARK-6492, I noticed that the
“isStopped” flag is set to true before executing the actual shutdown code. This
is a problem since it means that if the shutdown sequence
Hi,
It seems to me that there is an overhead in runMiniBatchSGD function of
MLlib's GradientDescent. In particular, sample and treeAggregate might
take time that is order of magnitude greater than the actual gradient
computation. In particular, for mnist dataset of 60K instances, minibatch
Hi Sam,
What is the best way to do it? Should I clone netlib-java, edit readme.md and
make a PR?
Best regards, Alexander
-Original Message-
From: Xiangrui Meng [mailto:men...@gmail.com]
Sent: Monday, March 30, 2015 2:43 PM
To: Sean Owen
Cc: Evan R. Sparks; Sam Halliday;
For alm I have started experimenting with the following:
1. rmse and map improvement from loglikelihood loss over least square loss.
2. Factorization for datasets that are not ratings (basically improvement
over implicit ratings)
3. Sparse topic generation using plsa. We are directly optimizing
I want to get removed RDD from windows as follow, The old RDDs will removed
from current window,
// _
// | previous window _|___
// |___| current window| -- Time
//
10 matches
Mail list logo