Re: Pls assist: Spark 2.0 build failure on Ubuntu 16.06

2016-10-02 Thread Marco Mistroni
Hi Sean thanks. I managed to build Spark 2 (it was actually 2.1, not 2.0...i m sourcing it from here (git clone git://github.com/apache/spark.git)) Now, i managed to build it but i had to - use Java 1.7 along with MAVEN_OPTS (using java1.8 send the whole process into insufficient memory for JVM

statistical theory behind estimating the number of total tasks in GroupedSumEvaluator.scala

2016-10-02 Thread philipghu
Hi, I've been struggling to understand the statistical theory behind this piece of code (from /core/src/main/scala/org/apache/spark/partial/GroupedSumEvaluator.scala) below, especially with respect to estimating the size of the population (total tasks) and its variance. Also I'm trying to

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-10-02 Thread Babak Alipour
Thanks Vadim for sharing your experience, but I have tried multi-JVM setup (2 workers), various sizes for spark.executor.memory (8g, 16g, 20g, 32g, 64g) and spark.executor.core (2-4), same error all along. As for the files, these are all .snappy.parquet files, resulting from inserting some data

Re: use CrossValidatorModel for prediction

2016-10-02 Thread Pengcheng Luo
> On Oct 2, 2016, at 1:04 AM, Pengcheng wrote: > > Dear Spark Users, > > I was wondering. > > I have a trained crossvalidator model > model: CrossValidatorModel > > I wan to predict a score for features: RDD[Features] > > Right now I have to convert features to

Partitioned windows in spark streaming

2016-10-02 Thread Adrienne Kole
Hi, Is spark 2.0.0 supports partitioned windows in streaming? Cheers Adrienne

Re: Dataframe, Java: How to convert String to Vector ?

2016-10-02 Thread Yan Facai
Hi, Perter. It's interesting that `DecisionTreeRegressor.transformImpl` also use udf to transform dataframe, instead of using map: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala#L175 On Wed, Sep 21, 2016 at 10:22 PM,

Re: Spark ML Decision Trees Algorithm

2016-10-02 Thread Yan Facai
Perhaps the best way is to read the code. The Decision tree is implemented by 1-tree Random forest, whose entry point is `run` method: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala#L88 I'm not familiar with the so-called

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-02 Thread Mich Talebzadeh
Thanks Ben The thing is I am using Spark 2 and no stack from CDH! Is this approach to reading/writing to Hbase specific to Cloudera? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

unsubscribe

2016-10-02 Thread Nikos Viorres