Re: latest Spark build error

2015-12-24 Thread salexln
Updating Maven version to 3.3.9 solved the issue Thanks everyone! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/latest-Spark-build-error-tp15782p15787.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: latest Spark build error

2015-12-24 Thread Kazuaki Ishizaki
This is because to build Spark requires maven 3.3.3 or later. http://spark.apache.org/docs/latest/building-spark.html Regards, Kazuaki Ishizaki From: salexln To: dev@spark.apache.org Date: 2015/12/25 15:52 Subject:latest Spark build error Hi all, I'm

How can I get the column data based on specific column name and then stored these data in array or list ?

2015-12-24 Thread zml张明磊
Hi, I am a new to Scala and Spark and trying to find relative API in DataFrame to solve my problem as title described. However, I just only find this API DataFrame.col(colName : String) : Column which returns an object of Column. Not the content. If only DataFrame support such API

Re: Shuffle Write Size

2015-12-24 Thread Xingchi Wang
I think shuffle write size not dependency on the your data, but on the join operation, maybe your join action, don't need to shuffle more data, because the table data has already on its partition, so it not need shuffle write, is it possible? 2015-12-25 0:53 GMT+08:00 gsvic

latest Spark build error

2015-12-24 Thread salexln
Hi all, I'm getting build error when trying to build a clean version of latest Spark. I did the following 1) git clone https://github.com/apache/spark.git 2) build/mvn -DskipTests clean package But I get the following error: Spark Project Parent POM .. FAILURE [2.338s]

答复: How can I get the column data based on specific column name and then stored these data in array or list ?

2015-12-24 Thread zml张明磊
Thanks, Jeff. It’s not choose some columns of a Row. It’s just choose all data in a column and convert it to an Array. Do you understand my mean ? In Chinese 我是想基于这个列名把这一列中的所有数据都选出来,然后放到数组里面去。 发件人: Jeff Zhang [mailto:zjf...@gmail.com] 发送时间: 2015年12月25日 15:39 收件人: zml张明磊 抄送:

Shuffle Write Size

2015-12-24 Thread gsvic
Is there any formula with which I could determine Shuffle Write before execution? For example, in Sort Merge join in the stage in which the first table is being loaded the shuffle write is 429.2 MB. The table is 5.5G in the HDFS with block size 128 MB. Consequently is being loaded in 45

Re: Downloading Hadoop from s3://spark-related-packages/

2015-12-24 Thread Nicholas Chammas
not that likely to get an answer as it’s really a support call, not a bug/task. The first question is about proper documentation of all the stuff we’ve been discussing in this thread, so one would think that’s a valid task. It doesn’t seem right that closer.lua, for example, is undocumented.

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

2015-12-24 Thread Vinay Shukla
+1 Tested on HDP 2.3, YARN cluster mode, spark-shell On Wed, Dec 23, 2015 at 6:14 AM, Allen Zhang wrote: > > +1 (non-binding) > > I have just tarball a new binary and tested am.nodelabelexpression and > executor.nodelabelexpression manully, result is expected. > > > > >

Re: Downloading Hadoop from s3://spark-related-packages/

2015-12-24 Thread Steve Loughran
On 24 Dec 2015, at 05:59, Nicholas Chammas > wrote: FYI: I opened an INFRA ticket with questions about how best to use the Apache mirror network. https://issues.apache.org/jira/browse/INFRA-10999 Nick not that likely to get an

[DAGScheduler] resubmitFailedStages, failedStages.clear() and submitStage

2015-12-24 Thread Jacek Laskowski
Hi, While reviewing DAGScheduler, and where failedStages internal collection of failed staged ready for resubmission is used, I came across a question for which I'm looking an answer to. Any hints would be greatly appreciated. When resubmitFailedStages [1] is executed, and there are any failed

Re: [DAGScheduler] resubmitFailedStages, failedStages.clear() and submitStage

2015-12-24 Thread Ted Yu
getMissingParentStages(stage) would be called for the stage (being re-submitted) If there is no missing parents, submitMissingTasks() would be called. If there is missing parent(s), the parent would go through the same flow. I don't see issue in this part of the code. Cheers On Thu, Dec 24,