[GitHub] spark issue #16324: [SPARK-18910][SQL]Resolve faile to use UDF that jar file...

2016-12-18 Thread shenh062326
Github user shenh062326 commented on the issue: https://github.com/apache/spark/pull/16324 I‘m sorry, @rxin, I don't understand what you mean. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project doe

[GitHub] spark issue #16324: [SPARK-18910][SQL]Resolve faile to use UDF that jar file...

2016-12-17 Thread shenh062326
Github user shenh062326 commented on the issue: https://github.com/apache/spark/pull/16324 Currently,we can create a UDF with jar in HDFS, but failed to use it. Spark driver won't download the jar from HDFS, it only add the path to the classLoader. If we don'

[GitHub] spark issue #16324: [SPARK-18910][SQL]Resolve faile to use UDF that jar file...

2016-12-17 Thread shenh062326
Github user shenh062326 commented on the issue: https://github.com/apache/spark/pull/16324 Should we download the UDF jar from hdfs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #16324: Resolve faile to use UDF that jar file in hdfs.

2016-12-16 Thread shenh062326
GitHub user shenh062326 opened a pull request: https://github.com/apache/spark/pull/16324 Resolve faile to use UDF that jar file in hdfs. ## What changes were proposed in this pull request? In SparkContext, setURLStreamHandlerFactory method on URL with an instance of

[GitHub] spark pull request #14557: [SPARK-16709][CORE] Kill the running task if stag...

2016-08-14 Thread shenh062326
Github user shenh062326 commented on a diff in the pull request: https://github.com/apache/spark/pull/14557#discussion_r74714601 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -798,6 +798,19 @@ private[spark] class TaskSetManager

[GitHub] spark pull request #14574: [SPARK-16985] Change dataFormat from yyyyMMddHHmm...

2016-08-09 Thread shenh062326
GitHub user shenh062326 opened a pull request: https://github.com/apache/spark/pull/14574 [SPARK-16985] Change dataFormat from MMddHHmm to MMddHHmmss ## What changes were proposed in this pull request? In our cluster, sometimes the sql output maybe overrided. When I

[GitHub] spark pull request #14557: [SPARK-16709][CORE] Kill the running task if stag...

2016-08-09 Thread shenh062326
Github user shenh062326 commented on a diff in the pull request: https://github.com/apache/spark/pull/14557#discussion_r74021599 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1564,6 +1564,14 @@ class SparkContext(config: SparkConf) extends Logging with

[GitHub] spark pull request #14557: [SPARK-16709][CORE] Kill the running task if stag...

2016-08-08 Thread shenh062326
GitHub user shenh062326 opened a pull request: https://github.com/apache/spark/pull/14557 [SPARK-16709][CORE] Kill the running task if stage failed ## What changes were proposed in this pull request? At SPARK-16709, when a stage failed, but the running task is still

[GitHub] spark pull request: [SPARK-13450][SQL] External spilling when join...

2016-02-25 Thread shenh062326
GitHub user shenh062326 opened a pull request: https://github.com/apache/spark/pull/11386 [SPARK-13450][SQL] External spilling when join a lot of rows with the same key SortMergeJoin use a ArrayBuffer[InternalRow] to store bufferedMatches, if the join have a lot of rows with the

[GitHub] spark pull request: [SPARK-10918] [CORE] Prevent task failed for e...

2015-10-04 Thread shenh062326
GitHub user shenh062326 opened a pull request: https://github.com/apache/spark/pull/8975 [SPARK-10918] [CORE] Prevent task failed for executor kill by driver When dynamicAllocation is enabled, when a executor was idle timeout, it will be kill by driver, if a task offer to the

[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-27 Thread shenh062326
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/5608#issuecomment-96870004 @srowen @mateiz Thanks for you review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-26 Thread shenh062326
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/5608#issuecomment-96454990 I don't know why it has not start build automaticly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-26 Thread shenh062326
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/5608#issuecomment-96369283 Thanks, I will fix it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-25 Thread shenh062326
Github user shenh062326 commented on a diff in the pull request: https://github.com/apache/spark/pull/5608#discussion_r29107048 --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala --- @@ -204,25 +204,36 @@ private[spark] object SizeEstimator extends Logging

[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-24 Thread shenh062326
Github user shenh062326 commented on a diff in the pull request: https://github.com/apache/spark/pull/5608#discussion_r29097662 --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala --- @@ -204,25 +204,36 @@ private[spark] object SizeEstimator extends Logging

[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-24 Thread shenh062326
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/5608#issuecomment-95908120 Sampling strategy not always works, but sampling twice are more effective then only discarding the first non-null sample. And sampling 200 times will not cause

[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-24 Thread shenh062326
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/5608#issuecomment-95904531 @srowen The last assertResult I have add in the testcase is the case that can't only discarding the first non-null sample, because half of the array elem

[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-23 Thread shenh062326
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/5608#issuecomment-95767900 It seems always work in my cluster, at least I have not find a case not work. But if I change to the simpler one, sometimes it doesn't work. --- If your proje

[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-22 Thread shenh062326
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/5608#issuecomment-95388189 @srowen At first, I also want to exclude shared objects by discarding the first non-null sample, but not always work, since not all the objects links to the

[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-21 Thread shenh062326
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/5608#issuecomment-94995027 @mateiz In most case, the first sampling size is contain the shared objects, the second will not. But if the arrray is large, and is only has a few not null

[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-21 Thread shenh062326
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/5608#issuecomment-94754171 No, the change has no matter with the check for null. If the arraySize > 200, and elem has the share object, the SizeEstimator.visitArray is not correct.

[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-21 Thread shenh062326
GitHub user shenh062326 opened a pull request: https://github.com/apache/spark/pull/5608 [SPARK-6738] [CORE] Improve estimate the size of a large array Currently, SizeEstimator.visitArray is not correct in the follow case, array size > 200, elem has the share obj

[GitHub] spark pull request: [SPARK-5529][CORE]Add expireDeadHosts in Heart...

2015-02-26 Thread shenh062326
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/4363#issuecomment-76145260 Sorry for late, I will change it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-5529][CORE]Add expireDeadHosts in Heart...

2015-02-26 Thread shenh062326
Github user shenh062326 commented on a diff in the pull request: https://github.com/apache/spark/pull/4363#discussion_r25413198 --- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala --- @@ -17,33 +17,84 @@ package org.apache.spark -import

[GitHub] spark pull request: [SPARK-5529][CORE]Add expireDeadHosts in Heart...

2015-02-12 Thread shenh062326
Github user shenh062326 commented on a diff in the pull request: https://github.com/apache/spark/pull/4363#discussion_r24574765 --- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala --- @@ -17,33 +17,84 @@ package org.apache.spark -import

[GitHub] spark pull request: [SPARK-5529][CORE]Add expireDeadHosts in Heart...

2015-02-11 Thread shenh062326
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/4363#issuecomment-74023636 Hi @sryza, I think this pull request is OK now, can you merge it into master? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-5736][Web UI]Add executor log url to Ex...

2015-02-11 Thread shenh062326
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/4529#issuecomment-73857999 Hi @srowen. We just want to read executor log from UI. is there any easy way to add executor log url to UI? --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-5736][Web UI]Add executor log url to Ex...

2015-02-11 Thread shenh062326
GitHub user shenh062326 opened a pull request: https://github.com/apache/spark/pull/4529 [SPARK-5736][Web UI]Add executor log url to Executors page on Yarn Currently, there is not executor log url in spark ui (on Yarn), we have to read executor log by login the machine that

[GitHub] spark pull request: [SPARK-5529][CORE]Add expireDeadHosts in Heart...

2015-02-09 Thread shenh062326
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/4363#issuecomment-73638561 The failed tests have no relationship with this patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-5529][CORE]Add expireDeadHosts in Heart...

2015-02-09 Thread shenh062326
Github user shenh062326 commented on a diff in the pull request: https://github.com/apache/spark/pull/4363#discussion_r24381524 --- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala --- @@ -17,33 +17,82 @@ package org.apache.spark -import

[GitHub] spark pull request: [SPARK-5529][CORE]Add expireDeadHosts in Heart...

2015-02-09 Thread shenh062326
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/4363#issuecomment-73625458 Hi @andrewor14 , @sryza and @rxin. Thanks. I agree with your views. I will change sc.killExecutor to not throw an assertion error. --- If your project is set up

[GitHub] spark pull request: [SPARK-5529][CORE]Add expireDeadHosts in Heart...

2015-02-06 Thread shenh062326
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/4363#issuecomment-73220183 scheduler.executorLost(executorId, SlaveLost()) will call BlockManagerMasterActor.removeBlockManager, the stack is: HeartbeatReceiver.expireDeadHosts

[GitHub] spark pull request: [SPARK-5529][CORE]Add expireDeadHosts in Heart...

2015-02-05 Thread shenh062326
Github user shenh062326 commented on a diff in the pull request: https://github.com/apache/spark/pull/4363#discussion_r24216671 --- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala --- @@ -17,33 +17,85 @@ package org.apache.spark -import

[GitHub] spark pull request: [SPARK-5529][CORE]Add expireDeadHosts in Heart...

2015-02-05 Thread shenh062326
Github user shenh062326 commented on a diff in the pull request: https://github.com/apache/spark/pull/4363#discussion_r24215867 --- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala --- @@ -17,33 +17,85 @@ package org.apache.spark -import

[GitHub] spark pull request: [SPARK-5529][CORE]Add expireDeadHosts in Heart...

2015-02-05 Thread shenh062326
Github user shenh062326 commented on a diff in the pull request: https://github.com/apache/spark/pull/4363#discussion_r24215268 --- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala --- @@ -17,33 +17,85 @@ package org.apache.spark -import

[GitHub] spark pull request: [SPARK-5529][CORE]Add expireDeadHosts in Heart...

2015-02-04 Thread shenh062326
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/4363#issuecomment-73006178 The failed testcase has no relationship with this patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-5529][CORE]Add expireDeadHosts in Heart...

2015-02-04 Thread shenh062326
Github user shenh062326 commented on a diff in the pull request: https://github.com/apache/spark/pull/4363#discussion_r24138722 --- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala --- @@ -32,18 +33,56 @@ private[spark] case class Heartbeat( taskMetrics

[GitHub] spark pull request: Add expireDeadHosts in HeartbeatReceiver

2015-02-04 Thread shenh062326
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/4363#issuecomment-72826509 add [SPARK-5529] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: Add expireDeadHosts in HeartbeatReceiver

2015-02-04 Thread shenh062326
GitHub user shenh062326 opened a pull request: https://github.com/apache/spark/pull/4363 Add expireDeadHosts in HeartbeatReceiver If a blockManager has not send heartBeat more than 120s, BlockManagerMasterActor will remove it. But coarseGrainedSchedulerBackend can only remove

[GitHub] spark pull request: [SPARK-4934][CORE] Print remote address in Con...

2015-01-24 Thread shenh062326
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/4157#issuecomment-71348155 I think you are right, it's no need to change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as wel

[GitHub] spark pull request: SPARK-5199. Input metrics should show up for I...

2015-01-24 Thread shenh062326
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/4050#issuecomment-71347965 If we use a inputFormat that don‘t instanc of org.apache.hadoop.mapreduce.lib.input.{CombineFileSplit, FileSplit}, then we can't get information of input me

[GitHub] spark pull request: [SPARK-5347][CORE] Change FileSplit to InputSp...

2015-01-24 Thread shenh062326
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/4150#issuecomment-71347933 If we use a inputFormat that don‘t instanc of org.apache.hadoop.mapreduce.lib.input.{CombineFileSplit, FileSplit}, then we can't get information of input me

[GitHub] spark pull request: [SPARK-4934][CORE] Print remote address in Con...

2015-01-22 Thread shenh062326
Github user shenh062326 commented on a diff in the pull request: https://github.com/apache/spark/pull/4157#discussion_r23370062 --- Diff: core/src/main/scala/org/apache/spark/network/nio/ConnectionManager.scala --- @@ -375,16 +375,22 @@ private[nio] class ConnectionManager

[GitHub] spark pull request: [SPARK-4934][CORE] Print remote address in Con...

2015-01-22 Thread shenh062326
GitHub user shenh062326 opened a pull request: https://github.com/apache/spark/pull/4157 [SPARK-4934][CORE] Print remote address in ConnectionManager Connection key is hard to read : key already cancelled ? sun.nio.ch.SelectionKeyImpl@52b0e278. It’s hard to solve problem by

[GitHub] spark pull request: [SPARK-5347][CORE] Change FileSplit to InputSp...

2015-01-21 Thread shenh062326
GitHub user shenh062326 opened a pull request: https://github.com/apache/spark/pull/4150 [SPARK-5347][CORE] Change FileSplit to InputSplit in update inputMetrics When inputFormatClass is set to CombineFileInputFormat, input metrics show that input is empty. It don't appear is

[GitHub] spark pull request: [Spark Core] SPARK-4380 Edit spilling log from...

2014-11-13 Thread shenh062326
Github user shenh062326 commented on a diff in the pull request: https://github.com/apache/spark/pull/3243#discussion_r20337096 --- Diff: core/src/main/scala/org/apache/spark/util/collection/Spillable.scala --- @@ -105,7 +105,7 @@ private[spark] trait Spillable[C

[GitHub] spark pull request: [Spark Core] SPARK-4380 Edit spilling log from...

2014-11-13 Thread shenh062326
GitHub user shenh062326 opened a pull request: https://github.com/apache/spark/pull/3243 [Spark Core] SPARK-4380 Edit spilling log from MB to B https://issues.apache.org/jira/browse/SPARK-4380 You can merge this pull request into a Git repository by running: $ git pull https