Github user YanTangZhai closed the pull request at:
https://github.com/apache/spark/pull/3810
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/3794#issuecomment-71405599
@JoshRosen I don't think just calling rdd.partitions on the final RDD could
achieve our goal. Furthermore, rdd.partitions has been called before:
470 // Che
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/3794#issuecomment-71308409
@JoshRosen I've brought this up to date with master. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on G
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/3794#issuecomment-70628086
@JoshRosen Thanks. I've updated it as your comments. Please review again.
However, these's merge conflicts. I will resolve this conflict if this
approach
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/3794#issuecomment-70481411
@JoshRosen Thanks for your comments. I've updates it. I directly use
getParentStages which will call RDD's getPartitions before sending JobSubmitted
event
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/4105
[SPARK-5316] [CORE] DAGScheduler may make shuffleToMapStage leak if
getParentStages failes
DAGScheduler may make shuffleToMapStage leak if getParentStages failes.
If getParentStages has
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/3794#issuecomment-69916653
@JoshRosen I've updated it. Please review again. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitH
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/3810#issuecomment-69716974
@srowen I've updated this PR and resolved conflict. Please review again.
Thanks.
I explain three points:
1. I am not sure the description makes a case
Github user YanTangZhai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3810#discussion_r22776416
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -333,9 +333,15 @@ class SparkContext(config: SparkConf) extends Logging
with
Github user YanTangZhai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3810#discussion_r22776371
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -333,9 +333,15 @@ class SparkContext(config: SparkConf) extends Logging
with
Github user YanTangZhai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3810#discussion_r22776305
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -55,13 +57,9 @@ private[spark] class Client
Github user YanTangZhai closed the pull request at:
https://github.com/apache/spark/pull/3963
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/3963#issuecomment-69523350
@pwendell Ok. Thank you very much. I close this PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/3963
[SPARK-5163] [CORE] Load properties from configuration file for example
spark-defaults.conf when creating SparkConf object
I create and run a Spark program which does not use SparkSubmit
Github user YanTangZhai closed the pull request at:
https://github.com/apache/spark/pull/3845
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/3845#issuecomment-69282504
@andrewor14 @rxin Oh, I see. Thank you very much.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/3794#issuecomment-68438167
@JoshRosen Thanks for your comments. I've updated it according to your
comments and contrived a simple example as follows:
```javascript
val input
Github user YanTangZhai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3794#discussion_r22376680
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -178,7 +178,7 @@ abstract class RDD[T: ClassTag](
// Our dependencies and
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/3555#issuecomment-68425639
@marmbrus I've updated it. Please review again.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as wel
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/3845
[SPARK-5007] [CORE] Try random port when startServiceOnPort to reduce the
chance of port collision
When multiple Spark programs are submitted at the same node (called
springboard machine). The
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/3810
[SPARK-4962] [CORE] Put TaskScheduler.start back in SparkContext to shorten
cluster resources occupation period
When SparkContext object is instantiated, TaskScheduler is started and some
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/3794
[SPARK-4961] [CORE] Put HadoopRDD.getPartitions forward to reduce
DAGScheduler.JobSubmitted processing time
HadoopRDD.getPartitions is lazyied to process in DAGScheduler.JobSubmitted.
If
Github user YanTangZhai closed the pull request at:
https://github.com/apache/spark/pull/3786
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/3786#issuecomment-68082514
@markhamstra Thanks for your comment. I will analyse deeply why stage
attempts so many times.
---
If your project is set up for it, you can reply to this email and
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/3786
[SPARK-4723] [CORE] To abort the stages which have attempted some times
For some reason, some stages may attempt many times. A threshold could be
added and the stages which have attempted more
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/3785
[SPARK-4946] [CORE] Using AkkaUtils.askWithReply in
MapOutputTracker.askTracker to reduce the chance of the communicating problem
Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/2409#issuecomment-68021964
@JoshRosen Thanks. I will divide this JIRA/PR into two JIRAs/PRs.
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/3555#issuecomment-67816709
@liancheng I will revert the last space change. Thanks for your comment.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/3555#issuecomment-67473028
@marmbrus Please review again. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/3556#issuecomment-67472596
@marmbrus Please review again. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/3137#issuecomment-67452153
@marmbrus Thanks. I'm also trying another approach to optimize this
operation. I want to discuss it with you later.
---
If your project is set up for it, yo
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/3556#issuecomment-67437985
@marmbrus Thank you for your comments. I will do it right away.
---
If your project is set up for it, you can reply to this email and have your
reply appear on
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/3556
[SPARK-4693] [SQL] PruningPredicates may be wrong if predicates contains an
empty AttributeSet() references
The sql "select * from spark_test::for_test where abs(20141202) is not
null
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/3555
[SPARK-4692] [SQL] Support ! boolean logic operator like NOT
Support ! boolean logic operator like NOT in sql as follows
select * from for_test where !(col1 > col2)
You can merge this p
Github user YanTangZhai closed the pull request at:
https://github.com/apache/spark/pull/3539
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user YanTangZhai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3539#discussion_r21140476
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -238,10 +238,13 @@ class HadoopRDD[K, V](
val value: V
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/3539
[SPARK-4677] [WEB] Add hadoop input time in task webui
Add hadoop input time in task webui like GC Time to explicitly show the
time used by task to read input data.
You can merge this pull
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/3538
[SPARK-4676] [SQL] JavaSchemaRDD.schema may throw NullType MatchError if
sql has null
val jsc = new org.apache.spark.api.java.JavaSparkContext(sc)
val jhc = new
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/3265#issuecomment-63058643
@srowen Thanks. I close this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user YanTangZhai closed the pull request at:
https://github.com/apache/spark/pull/3265
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/3265
[SPARK-4401] [SQL] RuleExecutor should log trace correct iteration num
RuleExecutor should log trace correct iteration num
You can merge this pull request into a Git repository by running
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/3137
[WIP] [SPARK-4273] [SQL] Providing ExternalSet to avoid OOM when
count(distinct)
Some task may OOM when count(distinct) if it needs to process many records.
CombineSetsAndCountFunction puts
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/2857#issuecomment-59915528
@marmbrus Thanks. Please disregard it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user YanTangZhai closed the pull request at:
https://github.com/apache/spark/pull/2857
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/2857
[SPARK-4009][SQL]HiveTableScan should use makeRDDForTable instead of
makeRDDForPartitionedTable for partitioned table when partitionPruningPred is
None
HiveTableScan should use makeRDDForTable
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/2409
[SPARK-3545] Put HadoopRDD.getPartitions forward and put
TaskScheduler.start back to reduce DAGScheduler.JobSubmitted processing time
and shorten cluster resources occupation period
We have
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/1617#issuecomment-55376375
@andrewor14 Thanks. Please review again.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/1921#issuecomment-55371512
@andrewor14 If a running stage is fetch failed, it will be moved to
failedStages from runningStages. But it is still kept alive in web ui. Then I
try to kill this
Github user YanTangZhai closed the pull request at:
https://github.com/apache/spark/pull/1921
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user YanTangZhai closed the pull request at:
https://github.com/apache/spark/pull/1618
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/1618#issuecomment-55254537
@andrewor14 Yeah, I see. I will close the PR. If needed, it could be
reopened. Thank you very much.
---
If your project is set up for it, you can reply to this
Github user YanTangZhai closed the pull request at:
https://github.com/apache/spark/pull/1854
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/1854#issuecomment-55254070
@jkbradley I will close this PR. Thank you very much.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/2059#issuecomment-52884506
Hi @JoshRosen SparkContext1 creates broadcastManager and initializes
HttpBroadcast object. HttpBroadcast creates httpserver and broadcastDir and so
on. However
Github user YanTangZhai closed the pull request at:
https://github.com/apache/spark/pull/2058
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/2058#issuecomment-52783462
#2059
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/2059
[SPARK-3148] Update global variables of HttpBroadcast so that multiple
SparkContexts can coexist
Update global variables of HttpBroadcast so that multiple SparkContexts can
coexist
You can
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/2058
Update global variables of HttpBroadcast so that multiple SparkContexts can
coexist
Update global variables of HttpBroadcast so that multiple SparkContexts can
coexist
You can merge this pull
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/1966
[SPARK-3067] JobProgressPage could not show Fair Scheduler Pools section
sometimes
JobProgressPage could not show Fair Scheduler Pools section sometimes.
SparkContext starts webui and then
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/1921
[SPARK-3003] FailedStage could not be cancelled by DAGScheduler when
cancelJob or cancelStage
Some stage is changed from running to failed, then DAGSCheduler could not
cancel it when cancelJob
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/1854#issuecomment-51604647
Please review again, thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/1854#issuecomment-51601604
@srowen The stage 10 will be removed from stageIdToData later. Since it
will be added into completedStages or failedStages again and will be removed
from
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/1854#issuecomment-51597142
@srowen I see, thanks. I will modify.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/1854#issuecomment-51595260
@srowen The completedStages may contains stages as follows ...10, 10, 10,
10, 10, 11, 18... and the activeStages may contains 1, 10, 5 with unique 10
and the
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/1854
[Spark 2643] Stages web ui has ERROR when pool name is None
14/07/23 16:01:44 WARN servlet.ServletHandler: /stages/
java.util.NoSuchElementException: None.get
at scala.None$.get
Github user YanTangZhai closed the pull request at:
https://github.com/apache/spark/pull/1244
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/1392#issuecomment-51190110
@pwendell Sorry, I'm late. Please disregard this PR since #1734 has been
closed.
---
If your project is set up for it, you can reply to this email and have
Github user YanTangZhai closed the pull request at:
https://github.com/apache/spark/pull/1392
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/1617#issuecomment-50564465
Hi @markhamstra When DAGScheduler concurrently runs multiple jobs,
SparkContext only logs "Job finished" and logs in the same file which doesn't
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/1618#issuecomment-50425621
Hi @andrewor14 The default values of the two max limits are zero, which
does not change the original operating mode and does not fail an application
that is running
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/1618
[SPARK-2715] ExternalAppendOnlyMap adds max limit of times and max limit of
disk bytes written for spilling
ExternalAppendOnlyMap adds max limit of times and max limit of disk bytes
written
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/1617
[SPARK-2714] DAGScheduler logs jobid when runJob finishes
DAGScheduler logs jobid when runJob finishes
You can merge this pull request into a Git repository by running:
$ git pull https
Github user YanTangZhai closed the pull request at:
https://github.com/apache/spark/pull/1548
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/1548#issuecomment-50292640
@markhamstra Ok. Thank you very much.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/1548#issuecomment-5727
Hi @markhamstra , you are right. I will think of other ways to solve this
problem. Thanks.
---
If your project is set up for it, you can reply to this email and
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/1548
[SPARK-2647] DAGScheduler plugs other JobSubmitted events when processing
one JobSubmitted event
If a few of jobs are submitted, DAGScheduler plugs other JobSubmitted
events when processing
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/1392#issuecomment-49584362
Hi @andrewor14 , that's ok. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your pr
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/1281#issuecomment-48840401
Hi @ash211, I think this change is needed. Since the method
Utils.getLocalDir is used by some function such as HttpBroadcast, which is
different from
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/1281#issuecomment-48840378
Hi @ash211, I think this change is needed. Since the method
Utils.getLocalDir is used by some function such as HttpBroadcast, which is
different from
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/1281#issuecomment-48840373
Hi @ash211, I think this change is needed. Since the method
Utils.getLocalDir is used by some function such as HttpBroadcast, which is
different from
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/1244#issuecomment-48839912
I've fixed the compile problem. Please review and test again. Thanks very
much.
---
If your project is set up for it, you can reply to this email and have
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/1392#issuecomment-48839668
fix #1244
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/1392#issuecomment-48839557
#1244
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/1392
[SPARK-2290] Worker should directly use its own sparkHome instead of
appDesc.sparkHome when LaunchExecutor
Worker should directly use its own sparkHome instead of appDesc.sparkHome
when
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/1281
[SPARK-2325] Utils.getLocalDir had better check the directory and choose a
good one instead of choosing the first one directly
If the first directory of spark.local.dir is bad, application will
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/1274#issuecomment-47737851
Thank aarondav. I've modified some codes. Please help to review again.
---
If your project is set up for it, you can reply to this email and have your
reply a
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/1274
[SPARK-2324] SparkContext should not exit directly when spark.local.dir is
a list of multiple paths and one of them has error
The spark.local.dir is configured as a list of multiple paths as
Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/1244#issuecomment-47612188
The sarkHome field is taken out of ApplicationDescription entirely. Please
review again. Thanks.
---
If your project is set up for it, you can reply to this email
GitHub user YanTangZhai opened a pull request:
https://github.com/apache/spark/pull/1244
[SPARK-2290] Worker should directly use its own sparkHome instead of
appDesc.sparkHome when LaunchExecutor
Worker should directly use its own sparkHome instead of appDesc.sparkHome
when
89 matches
Mail list logo