Re: How to convert dataframe to a nested StructType schema

2015-09-15 Thread Terry Hole
Hao, For spark 1.4.1, you can try this: val rowrdd = df.rdd.map(r => Row(Row(r(3)), Row(r(0), r(1), r(2 val newDF = sqlContext.createDataFrame(rowrdd, yourNewSchema) Thanks! - Terry On Wed, Sep 16, 2015 at 2:10 AM, Hao Wang wrote: > Hi, > > I created a dataframe with 4 string columns (cit

Re: Meets "java.lang.IllegalArgumentException" when test spark ml pipe with DecisionTreeClassifier

2015-09-08 Thread Terry Hole
etadata when you > construct the input data. > > On Sun, Sep 6, 2015 at 10:41 AM, Terry Hole wrote: > > Sean > > > > Do you know how to tell decision tree that the "label" is a binary or set > > some attributes to dataframe to carry number of classes? > &

Re: Meets "java.lang.IllegalArgumentException" when test spark ml pipe with DecisionTreeClassifier

2015-09-07 Thread Terry Hole
Xiangrui, Do you have any idea how to make this work? Thanks - Terry Terry Hole 于2015年9月6日星期日 17:41写道: > Sean > > Do you know how to tell decision tree that the "label" is a binary or set > some attributes to dataframe to carry number of classes? > > Thanks! >

Re: Meets "java.lang.IllegalArgumentException" when test spark ml pipe with DecisionTreeClassifier

2015-09-06 Thread Terry Hole
bute > though. I think that's the missing step. A double-valued column need > not be one of these attribute types. > > On Sun, Sep 6, 2015 at 10:14 AM, Terry Hole wrote: > > Hi, Owen, > > > > The dataframe "training" is from a RDD of case class: > R

Re: Meets "java.lang.IllegalArgumentException" when test spark ml pipe with DecisionTreeClassifier

2015-09-06 Thread Terry Hole
ot specified your label > column -- it's defaulting to "label" and it does not recognize it, or > at least not as a binary or nominal attribute. > > On Sun, Sep 6, 2015 at 5:47 AM, Terry Hole wrote: > > Hi, Experts, > > > > I followed the guide of spark

Meets "java.lang.IllegalArgumentException" when test spark ml pipe with DecisionTreeClassifier

2015-09-05 Thread Terry Hole
Hi, Experts, I followed the guide of spark ml pipe to test DecisionTreeClassifier on spark shell with spark 1.4.1, but always meets error like following, do you have any idea how to fix this? The error stack: *java.lang.IllegalArgumentException:

Re: Job aborted due to stage failure: java.lang.StringIndexOutOfBoundsException: String index out of range: 18

2015-08-28 Thread Terry Hole
Ricky, You may need to use map instead of flatMap in your case *val rowRDD=sc.textFile("/user/spark/short_model").map(_.split("\\t")).map(p => Row(...))* Thanks! -Terry On Fri, Aug 28, 2015 at 5:08 PM, our...@cnsuning.com wrote: > hi all, > > when using spark sql ,A problem bothering me.

Re: standalone to connect mysql

2015-07-21 Thread Terry Hole
t.sql(s"insert into Table newStu select * from otherStu") > > that works. > > > > Is there any document addressing that? > > > > > > Best regards, > > Jack > > > > > > *From:* Terry Hole [mailto:hujie.ea...@gmail.com] > *Sen

Re: standalone to connect mysql

2015-07-20 Thread Terry Hole
Maybe you can try: spark-submit --class "sparkwithscala.SqlApp" --jars /home/lib/mysql-connector-java-5.1.34.jar --master spark://hadoop1:7077 /home/myjar.jar Thanks! -Terry > Hi there, > > > > I would like to use spark to access the data in mysql. So firstly I tried > to run the program using

Re: [Spark Shell] Could the spark shell be reset to the original status?

2015-07-16 Thread Terry Hole
. Thanks! - Terry Ted Yu 于2015年7月17日周五 下午12:02写道: > See this recent thread: > > > http://search-hadoop.com/m/q3RTtFW7iMDkrj61/Spark+shell+oom+&subj=java+lang+OutOfMemoryError+PermGen+space > > > > On Jul 16, 2015, at 8:51 PM, Terry Hole wrote: > > Hi, > > Bac

[Spark Shell] Could the spark shell be reset to the original status?

2015-07-16 Thread Terry Hole
Hi, Background: The spark shell will get out of memory error after dealing lots of spark work. Is there any method which can reset the spark shell to the startup status? I tried "*:reset*", but it seems not working: i can not create spark context anymore (some compile error as below) after the "*

Re: fileStream with old files

2015-07-15 Thread Terry Hole
e(); > } > accumulator.add((int) v1.count()); > return null; > } > }); > context.start(); > // wait for completion or 20 sec > done.tryAcquire(20, TimeUnit.SECONDS); > context.stop(); > > assertThat(accumulat

Re: fileStream with old files

2015-07-13 Thread Terry Hole
A new configuration named *spark.streaming.minRememberDuration* was added since 1.2.1 to control the file stream input, the default value is *60 seconds*, you can change this value to a large value to include older files (older than 1 minute) You can get the detail from this jira: https://issues.a

Re: [Spark Hive SQL] Set the hive connection in hive context is broken in spark 1.4.1-rc1?

2015-07-10 Thread Terry Hole
Michael, Thanks - Terry Michael Armbrust 于2015年7月11日星期六 04:02写道: > Metastore configuration should be set in hive-site.xml. > > On Thu, Jul 9, 2015 at 8:59 PM, Terry Hole wrote: > >> Hi, >> >> I am trying to set the hive metadata destination to a mysql databas

[Spark Hive SQL] Set the hive connection in hive context is broken in spark 1.4.1-rc1?

2015-07-09 Thread Terry Hole
Hi, I am trying to set the hive metadata destination to a mysql database in hive context, it works fine in spark 1.3.1, but it seems broken in spark 1.4.1-rc1, where it always connect to the default metadata: local), is this a regression or we must set the connection in hive-site.xml? The code is

Re: Is there a way to shutdown the derby in hive context in spark shell?

2015-07-09 Thread Terry Hole
> On Wed, Jul 8, 2015 at 8:12 PM, Terry Hole wrote: > >> I am using spark 1.4.1rc1 with default hive settings >> >> Thanks >> - Terry >> >> Hi All, >> >> I'd like to use the hive context in spark shell, i need to recreate the >> hi

Re: Is there a way to shutdown the derby in hive context in spark shell?

2015-07-08 Thread Terry Hole
I am using spark 1.4.1rc1 with default hive settings Thanks - Terry Hi All, I'd like to use the hive context in spark shell, i need to recreate the hive meta database in the same location, so i want to close the derby connection previous created in the spark shell, is there any way to do this?

Is there a way to shutdown the derby in hive context in spark shell?

2015-07-08 Thread Terry Hole
Hi All, I'd like to use the hive context in spark shell, i need to recreate the hive meta database in the same location, so i want to close the derby connection previous created in the spark shell, is there any way to do this? I try this, but it does not work: DriverManager.getConnection("jdbc:de

Re: Meets class not found error in spark console with newly hive context

2015-07-02 Thread Terry Hole
Found this a bug in spark 1.4.0: SPARK-8368 <https://issues.apache.org/jira/browse/SPARK-8368> Thanks! Terry On Thu, Jul 2, 2015 at 1:20 PM, Terry Hole wrote: > All, > > I am using spark console 1.4.0 to do some tests, when a create a newly > HiveContext (Line 18 in th

Meets class not found error in spark console with newly hive context

2015-07-01 Thread Terry Hole
All, I am using spark console 1.4.0 to do some tests, when a create a newly HiveContext (Line 18 in the code) in my test function, it always throw exception like below (It works in spark console 1.3.0), but if i removed the HiveContext (The line 18 in the code) in my function, it works fine. Any i

Re: Is it possible to set the akka specify properties (akka.extensions) in spark

2015-05-11 Thread Terry Hole
kConf.set("spark.akka.extensions","Whatever"), underneath i think > spark won't ship properties which don't start with spark.* to the executors. > > Thanks > Best Regards > > On Mon, May 11, 2015 at 8:33 AM, Terry Hole wrote: > >> Hi all, >> &

Is it possible to set the akka specify properties (akka.extensions) in spark

2015-05-10 Thread Terry Hole
Hi all, I'd like to monitor the akka using kamon, which need to set the akka.extension to a list like this in typesafe config format: akka { extensions = ["kamon.system.SystemMetrics", "kamon.statsd.StatsD"] } But i can not find a way to do this, i have tried these: 1. SparkConf.set("akka

Is it possible to set the akka specify properties (akka.extensions) in spark

2015-05-10 Thread Terry Hole
Hi all, I'd like to monitor the akka using kamon, which need to set the akka.extension to a list like this in typesafe config format: akka { extensions = ["kamon.system.SystemMetrics", "kamon.statsd.StatsD"] } But i can not find a way to do this, i have tried these: 1. SparkConf.set("

Is it possible to set the akka specify properties (akka.extensions) in spark

2015-05-07 Thread Terry Hole
Hi all, I'd like to monitor the akka using kamon, which need to set the akka.extension to a list like this in typesafe config format: akka { extensions = ["kamon.system.SystemMetrics", "kamon.statsd.StatsD"] } But i can not find a way to do this, i have tried these: 1. SparkConf.set("akka

Re: spark 1.3.0 strange log message

2015-04-23 Thread Terry Hole
Use this in spark conf: spark.ui.showConsoleProgress=false Best Regards, On Fri, Apr 24, 2015 at 11:23 AM, Henry Hung wrote: > Dear All, > > > > When using spark 1.3.0 spark-submit with directing out and err to a log > file, I saw some strange lines inside that looks like this: > > [Stage 0:>

Re: [Spark Streaming] The FileInputDStream newFilesOnly=false does not work in 1.2 since

2015-01-21 Thread Terry Hole
See also SPARK-3276 and SPARK-3553. Can you say more about the > problem? what are the file timestamps, what happens when you run, what > log messages if any are relevant. I do not expect there was any > intended behavior change. > > On Wed, Jan 21, 2015 at 5:17 AM, Terry Hole wrote:

Fwd: [Spark Streaming] The FileInputDStream newFilesOnly=false does not work in 1.2 since

2015-01-20 Thread Terry Hole
Hi, I am trying to move from 1.1 to 1.2 and found that the newFilesOnly=false (Intend to include old files) does not work anymore. It works great in 1.1, this should be introduced by the last change of this class. Does this flag behavior change or is it a regression? Issue should be caused by