Re: featureSubsetStrategy parameter for GradientBoostedTreesModel

2017-06-15 Thread Pralabh Kumar
Hi everyone Currently GBT doesn't expose featureSubsetStrategy as exposed by Random Forest. . GradientBoostedTrees in Spark have hardcoded feature subset strategy to "all" while calling random forest in DecisionTreeRegressor.scala val trees = RandomForest.run(data, oldStrategy, numTrees = 1,

featureSubsetStrategy parameter for GradientBoostedTreesModel

2017-06-15 Thread Pralabh Kumar
Hi everyone Currently GBT doesn't expose featureSubsetStrategy as exposed by Random Forest. . GradientBoostedTrees in Spark have hardcoded feature subset strategy to "all" while calling random forest in DecisionTreeRegressor.scala val trees = RandomForest.run(data, oldStrategy, numTrees = 1,

Re: the dependence length of RDD, can its size be greater than 1 pleaae?

2017-06-15 Thread ??????????
Hi Owen, More issues about this topic. Is two the up limit od dependency please? In the code, firstParent RDD is used to get partitions.Why is firstparent RDD used please? If first parent RDD has 5 partitions and sevond has 6, is it reasonable to use first please? thanks

Re: Nested "struct" fonction call creates a compilation error in Spark SQL

2017-06-15 Thread Michael Armbrust
You might also try with a newer version. Several instance of code generation failures have been fixed since 2.0. On Thu, Jun 15, 2017 at 1:15 PM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > Hi Michael, > Spark 2.0.2 - but I have a very interesting test case actually > The

Re: Nested "struct" fonction call creates a compilation error in Spark SQL

2017-06-15 Thread Michael Armbrust
Which version of Spark? If its recent I'd open a JIRA. On Thu, Jun 15, 2017 at 6:04 AM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > Hi everyone, > when we create recursive calls to "struct" (up to 5 levels) for extending > a complex datastructure we end up with the following

Re: structured streaming documentation does not match behavior

2017-06-15 Thread Shixiong(Ryan) Zhu
Good catch. These are file source options. Could you submit a PR to fix the doc? Thanks! On Thu, Jun 15, 2017 at 10:46 AM, Mendelson, Assaf wrote: > Hi, > > I have started to play around with structured streaming and it seems the > documentation (structured streaming

structured streaming documentation does not match behavior

2017-06-15 Thread Mendelson, Assaf
Hi, I have started to play around with structured streaming and it seems the documentation (structured streaming programming guide) does not match the actual behavior I am seeing. It says in the documentation that maxFilesPerTrigger (as well as latestFirst) are options for the File sink.

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-15 Thread Felix Cheung
Sounds good. Think we checked and should be good to go. Appreciated. From: Michael Armbrust Sent: Wednesday, June 14, 2017 4:51:48 PM To: Hyukjin Kwon Cc: Felix Cheung; Nick Pentreath; dev; Sean Owen Subject: Re: [VOTE] Apache Spark 2.2.0

Nested "struct" fonction call creates a compilation error in Spark SQL

2017-06-15 Thread Olivier Girardot
Hi everyone, when we create recursive calls to "struct" (up to 5 levels) for extending a complex datastructure we end up with the following compilation error : org.codehaus.janino.JaninoRuntimeException: Code of method "(I[Lscala/collection/Iterator;)V" of class

Re: Performance regression for partitioned parquet data

2017-06-15 Thread Bertrand Bossy
Hi, I created https://issues.apache.org/jira/browse/SPARK-21056 and proposed an implementation here: https://github.com/apache/spark/pull/18269 I'll try to address cloud-fan's comment ASAP Any input welcome. Regards, Bertrand On Thu, Jun 15, 2017 at 1:27 AM, Mike Wheeler

Re: the dependence length of RDD, can its size be greater than 1 pleaae?

2017-06-15 Thread Sean Owen
Yes. Imagine an RDD that results from a union of other RDDs. On Thu, Jun 15, 2017, 09:11 萝卜丝炒饭 <1427357...@qq.com> wrote: > Hi all, > > The RDD code keeps a member as below: > dependencies_ : seq[Dependency[_]] > > It is a seq, that means it can keep more than one dependency. > > I have an issue

Re: the dependence length of RDD, can its size be greater than 1 pleaae?

2017-06-15 Thread Reynold Xin
A join? On Thu, Jun 15, 2017 at 1:11 AM 萝卜丝炒饭 <1427357...@qq.com> wrote: > Hi all, > > The RDD code keeps a member as below: > dependencies_ : seq[Dependency[_]] > > It is a seq, that means it can keep more than one dependency. > > I have an issue about this. > Is it possible that its size is

the dependence length of RDD, can its size be greater than 1 pleaae?

2017-06-15 Thread ??????????
Hi all, The RDD code keeps a member as below: dependencies_ : seq[Dependency[_]] It is a seq, that means it can keep more than one dependency. I have an issue about this. Is it possible that its size is greater than one please? If yes, how to produce it please? Would you like show me some

Re: [apache/spark] [TEST][SPARKR][CORE] Fix broken SparkSubmitSuite (#18283)

2017-06-15 Thread Sean Owen
Cc Shane? On Thu, Jun 15, 2017, 08:39 Felix Cheung wrote: > I guess that script can be changed to use JAVA_HOME instead of blindly > assume it's accessible... > are these new machines in Jenkins? > > — > You are receiving this because you were mentioned. > Reply to

Re: [E] Re: SPARK-19547

2017-06-15 Thread Sree V
Hi Pankaj, >> After the second consumer group comes upDo you mean a second consumer starts >> with the same consumer group as the first ? createDirectStream is overloaded. One of the method, doesn't need you to specify partitions of a topic. Cheers - Sree On Thursday, June 8, 2017 9:56