答复: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Linyuxin
Hi, Why not group by first then join? BTW, I don’t think there any difference between ‘distinct’ and ‘group by’ Source code of 2.1: def distinct(): Dataset[T] = dropDuplicates() … def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan { … Aggregate(groupCols, aggCols, logicalPlan) }

答复: [SparkSQL] pre-check syntex before running spark job?

2017-02-21 Thread Linyuxin
Hi Gurdit Singh Thanks. It is very helpful. 发件人: Gurdit Singh [mailto:gurdit.si...@bitwiseglobal.com] 发送时间: 2017年2月22日 13:31 收件人: Linyuxin ; Irving Duran ; Yong Zhang 抄送: Jacek Laskowski ; user 主题: RE: [SparkSQL] pre-check syntex before running spark job? Hi, you can use spark sql Antlr

答复: [SparkSQL] pre-check syntex before running spark job?

2017-02-21 Thread Linyuxin
Actually,I want a standalone jar as I can check the syntax without spark execution environment 发件人: Irving Duran [mailto:irving.du...@gmail.com] 发送时间: 2017年2月21日 23:29 收件人: Yong Zhang 抄送: Jacek Laskowski ; Linyuxin ; user 主题: Re: [SparkSQL] pre-check syntex before running spark job? You can

[SparkSQL] pre-check syntex before running spark job?

2017-02-20 Thread Linyuxin
Hi All, Is there any tool/api to check the sql syntax without running spark job actually? Like the siddhiQL on storm here: SiddhiManagerService. validateExecutionPlan https://github.com/wso2/siddhi/blob/master/modules/siddhi-core/src/main/java/org/wso2/siddhi/core/SiddhiManagerService.java it can

can UDF accept "Any"/"AnyVal"/"AnyRef"(java Object) as parameter or as return type ?

2017-01-03 Thread Linyuxin
Hi all With Spark 1.5.1 When I want to implement a oracle decode function (like decode(col1,1,’xxx’,’p2’,’yyy’,0)) And the code may like this sqlContext.udf.register("any_test", (s:AnyVal)=>{ if(s == null) null else s }) The error shows: Exception in thread "mai

答复: 答复: submit spark task on yarn asynchronously via java?

2016-12-25 Thread Linyuxin
Thanks. 发件人: Naveen [mailto:hadoopst...@gmail.com] 发送时间: 2016年12月25日 0:33 收件人: Linyuxin 抄送: user 主题: Re: 答复: submit spark task on yarn asynchronously via java? Hi, Please use SparkLauncher API class and invoke the threads using async calls using Futures. Using SparkLauncher, you can mention

答复: submit spark task on yarn asynchronously via java?

2016-12-22 Thread Linyuxin
Hi, Could Anybody help? 发件人: Linyuxin 发送时间: 2016年12月22日 14:18 收件人: user 主题: submit spark task on yarn asynchronously via java? Hi All, Version: Spark 1.5.1 Hadoop 2.7.2 Is there any way to submit and monitor spark task on yarn via java asynchronously?

submit spark task on yarn asynchronously via java?

2016-12-21 Thread Linyuxin
Hi All, Version: Spark 1.5.1 Hadoop 2.7.2 Is there any way to submit and monitor spark task on yarn via java asynchronously?

How to avoid sql injection on SparkSQL?

2016-08-04 Thread Linyuxin
Hi All, I want to know how to avoid sql injection on SparkSQL Is there any common pattern about this? e.g. some useful tool or code segment or just create a “wheel” on SparkSQL myself. Thanks.

Any reference of performance tuning on SparkSQL?

2016-07-28 Thread Linyuxin
Hi ALL Is there any reference of performance tuning on SparkSQL? I can only find about turning on spark core on http://spark.apache.org/

Where is the SparkSQL Specification?

2016-07-21 Thread Linyuxin
Hi All Newbee here. My spark version is 1.5.1 And I want to know how can I find the Specification of Spark SQL to find out that if it is supported ‘a like %b_xx’ or other sql syntax