Github user lirui-intel commented on the issue:
https://github.com/apache/spark/pull/12775
Thanks guys for the review!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user lirui-intel commented on the issue:
https://github.com/apache/spark/pull/12775
I think the failure is due to one more [skipped
test](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70890/testReport/pyspark.sql.tests/HiveContextSQLTests
Github user lirui-intel commented on the issue:
https://github.com/apache/spark/pull/16376
@kayousterhout I see. Thanks for the explanations :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user lirui-intel commented on the issue:
https://github.com/apache/spark/pull/12775
Not sure if my patch makes the tests unstable. But I can't figure out why.
@kayousterhout @mridulm any ideas?
---
If your project is set up for it, you can reply to this email and have
Github user lirui-intel commented on the issue:
https://github.com/apache/spark/pull/12775
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user lirui-intel commented on the issue:
https://github.com/apache/spark/pull/16376
The change looks good to me, although I still want to make sure I
understand it correctly. Before the change, a locality level is invalid if it
has delay=0. The patch changes that and makes
Github user lirui-intel commented on the issue:
https://github.com/apache/spark/pull/12775
I don't think the failure is related, and it can't be reproduced locally.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as wel
Github user lirui-intel commented on the issue:
https://github.com/apache/spark/pull/12775
The new test passed locally and I can't find any failures in the Jenkins
test report. Not sure what failed exactly.
---
If your project is set up for it, you can reply to this email and
Github user lirui-intel commented on the issue:
https://github.com/apache/spark/pull/12775
Sure. Updated patch to not catch Throwable.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user lirui-intel commented on the issue:
https://github.com/apache/spark/pull/12775
Hi @kayousterhout and @mridulm, to clarify, I think the error won't
disappear if we don't catch it. Because the runnable is wrapped in
Utils.logUncaughtExceptions so the error will
Github user lirui-intel commented on the issue:
https://github.com/apache/spark/pull/12775
Hi @kayousterhout and @markhamstra , could you take another look? Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user lirui-intel commented on the issue:
https://github.com/apache/spark/pull/12775
Thanks for the review. Updated the patch to address the comments.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user lirui-intel commented on the issue:
https://github.com/apache/spark/pull/12775
Could anybody help review this PR? Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/12775#issuecomment-218654108
Thanks @markhamstra for the explanations. I think currently the thread just
dies and we log the uncaught error. I can add a catch for NoClassDefFoundError
and
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/12775#issuecomment-218488089
Hey @markhamstra, anything specific that you think we should do in case of
more severe errors?
I think it doesn't hurt to handle the failed task in a fi
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/12775#issuecomment-217886648
Update to add test.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
GitHub user lirui-intel opened a pull request:
https://github.com/apache/spark/pull/12775
[SPARK-14958][Core] Failed task not handled when there's error
deserializing failure reason
## What changes were proposed in this pull request?
TaskResultGetter tries to deseri
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/2760#issuecomment-58739690
Looks great! I think it's very useful to have these async APIs in java :-)
---
If your project is set up for it, you can reply to this email and have your
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/2176#issuecomment-54007752
Thanks @pwendell , patch updated.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/2176#issuecomment-53960113
Thanks @rxin , @vanzin for the review. I've added experimental mark in the
java doc. I see that mima can automatically exclude DeveloperApi and
Experimental cl
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/2176#issuecomment-53853736
Thanks @rxin . I updated the patch.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/2176#issuecomment-53847106
Hi @rxin , could you be more specific as how to do it?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user lirui-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/2176#discussion_r16883922
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -149,6 +149,13 @@ class SimpleFutureAction[T] private[spark](jobWaiter
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/2176#issuecomment-53842283
Thanks @rxin . Updated the patch accordingly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/2176#issuecomment-53682639
@rxin I've updated the patch.
Yes I see these APIs are experimental. We can make hive use it as a
workaround and change it when we have a better sol
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/2176#issuecomment-53680289
I thought these async actions are missing in the java API so I added all of
them from AsyncRDDActions. But sure, let me just add foreachAsync.
---
If your project
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/2176#issuecomment-53679584
Hi @rxin, thanks for the review! I can add interface to SimpleFutureAction
to get the job id if we shouldn't expose JobWaiter to users.
Hive on spark curr
GitHub user lirui-intel opened a pull request:
https://github.com/apache/spark/pull/2176
SPARK-2636: no where to get job identifier while submit spark job through
spark API
This PR adds the async actions to the Java API. User can call these async
actions to get the FutureAction
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/1645#issuecomment-50858418
Thanks @JoshRosen :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
GitHub user lirui-intel opened a pull request:
https://github.com/apache/spark/pull/1645
SPARK-2740: allow user to specify ascending and numPartitions for sortBy...
It should be more convenient if user can specify ascending and
numPartitions when calling sortByKey.
You can merge
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/1454#issuecomment-49958248
Thanks @mateiz
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user lirui-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/1328#discussion_r15267187
--- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ---
@@ -340,6 +459,7 @@ private[spark] class MapOutputTrackerMaster(conf:
SparkConf
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/1313#issuecomment-49264857
If a TaskSet only contains no-pref tasks, there won't be delay because the
only valid level is ANY, so everything gets scheduled right away.
If a Ta
GitHub user lirui-intel opened a pull request:
https://github.com/apache/spark/pull/1454
SPARK-2277: clear host->rack info properly
Hi @mridulm, I just think of this issue of
[#1212](https://github.com/apache/spark/pull/1212): I added FakeRackUtil to
hold the host -> rack m
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/1212#issuecomment-49248835
Thanks everybody :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/1212#issuecomment-49125959
Thanks @mridulm and sorry for your laptop :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/1212#issuecomment-48873878
Hi @mridulm , I've added some test case to capture schedule behavior of
RACK_LOCAL tasks.
Let me know if I got anything wrong.
---
If your project is set u
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/1313#issuecomment-48859854
This looks good to me :)
Just a reminder that when TaskSchedulerImpl calls
TaskSetManager.resourceOffer, the maxLocality (changed to preferredLocality in
this PR
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/1328#issuecomment-48421136
Thanks @sryza for the idea. I think it's OK to piggy back the communication
in a heartbeat, but we should also allow the worker to explicitly ask the
master fo
GitHub user lirui-intel opened a pull request:
https://github.com/apache/spark/pull/1328
SPARK-2387: remove stage barrier
This PR is a PoC implementation of
[SPARK-2387](https://issues.apache.org/jira/browse/SPARK-2387).
When a ShuffleMapTask finishes, DAGScheduler will
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/1212#issuecomment-48025283
Thanks @mridulm for the review!
I don't quite get your point about the testcase though, could you please be
more specific on what testcase should be
GitHub user lirui-intel opened a pull request:
https://github.com/apache/spark/pull/1212
SPARK-2277: make TaskScheduler track hosts on rack
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/lirui-intel/spark trackHostOnRack
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-47051463
Thanks everybody :-)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user lirui-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r14061614
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -181,16 +181,14 @@ private[spark] class TaskSetManager(
var
Github user lirui-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r14059200
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -181,16 +181,14 @@ private[spark] class TaskSetManager(
var
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-46801884
Sorry about the code style and thanks @mateiz for pointing out. I've
updated the patch.
---
If your project is set up for it, you can reply to this email and
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45890408
Thanks @mridulm , I've updated the patch accordingly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45696187
I've updated the patch.
Currently, maxLocality starts from PROCESS_LOCAL (maxLocality <-
TaskLocality.values). What if we make it start from highest valid
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45691507
Thanks for the explanation @mridulm , really appreciate it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user lirui-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13601229
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -388,7 +386,7 @@ private[spark] class TaskSetManager(
val
Github user lirui-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13600111
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -388,7 +386,7 @@ private[spark] class TaskSetManager(
val
Github user lirui-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13599131
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -388,7 +386,7 @@ private[spark] class TaskSetManager(
val
Github user lirui-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13594625
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -388,7 +386,7 @@ private[spark] class TaskSetManager(
val
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45592572
@kayousterhout - I've fixed how we compute valid locality levels and added
some unit test.
Now computeValidLocalityLevels considers a level as valid only if
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45564434
Thanks @kayousterhout , I'll fix this ASAP.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45468581
Hi @kayousterhout , thanks for pointing out this. My understanding is that,
when TaskScheduler calls TaskSetManager.resourceOffer, it passes the parameter
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45398131
I've removed the delay for pendingTasksWithNoPrefs
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as wel
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45397807
@mridulm maybe it's better to allow users to specify how many executors
they need (which is not available with standalone mode I believe)? So they can
control t
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45397685
@mateiz - I added this waiting time into every TaskSetManager because with
dynamic resizing clusters (as you suggested earlier), we may add new executors
when new
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45299018
Sure @kayousterhout . Can we use
[SPARK-1937](https://issues.apache.org/jira/browse/SPARK-1937) for the
discussion?
---
If your project is set up for it, you can
Github user lirui-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13474841
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -182,15 +189,16 @@ private[spark] class TaskSetManager(
for
Github user lirui-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13474171
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -182,15 +189,16 @@ private[spark] class TaskSetManager(
for
Github user lirui-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13473499
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -54,8 +54,15 @@ private[spark] class TaskSetManager(
clock
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45293719
Hi @kayousterhout , let's consider a map stage whose tasks all have
NODE_LOCAL preference. So pendingTasksForExecutor is empty and all tasks are
add
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45207806
I've revised the patch as @mateiz suggested: tasks will be added to
corresponding lists even when preferred location is unavailable, in which case
it'll als
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45184290
That's true @kayousterhout , but my point is that what if all the tasks
only specify NODE_LOCAL preference (common case when the RDD is created from
some HDFS
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45176804
Hi @mateiz , I think we should distinguish between tasks that truly have no
preference, and tasks whose preference is unavailable when picking tasks from
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45175085
Yes @mateiz great idea. One quick question is that tasks in
pendingTasksWithNoPrefs are considered as PROCESS_LOCAL. Suppose we have no
tasks in
Github user lirui-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13421524
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -738,4 +739,13 @@ private[spark] class TaskSetManager
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-44394787
I've made some modifications, please help to see if this makes sense :)
@mridulm @rxin @kayousterhout
---
If your project is set up for it, you can reply to
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-44356508
If I understand, the application cannot control how many executors to
launch (at least with the standalone mode). New executors can be launched for
the application
GitHub user lirui-intel opened a pull request:
https://github.com/apache/spark/pull/892
SPARK-1937: fix issue with task locality
Don't check executor/host availability when creating a TaskSetManager.
Because the executors may haven't been registered when the TaskSet
72 matches
Mail list logo