GitHub user LantaoJin opened a pull request:
https://github.com/apache/spark/pull/22884
[SPARK-23429][CORE][FOLLOWUP] MetricGetter should rename to
ExecutorMetricType in comments
## What changes were proposed in this pull request?
MetricGetter should rename to
GitHub user LantaoJin opened a pull request:
https://github.com/apache/spark/pull/22874
[WIP][SPARK-25865][CORE] Add GC information to ExecutorMetrics
## What changes were proposed in this pull request?
This PR is opened on top of the PR for #22612 since it import an
Github user LantaoJin commented on a diff in the pull request:
https://github.com/apache/spark/pull/22612#discussion_r228830146
--- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
@@ -394,9 +394,15 @@ private[spark] object JsonProtocol
Github user LantaoJin commented on a diff in the pull request:
https://github.com/apache/spark/pull/22678#discussion_r224309582
--- Diff: dev/run-tests-jenkins.py ---
@@ -39,7 +39,8 @@ def print_err(msg):
def post_message_to_github(msg, ghprb_pull_id):
print
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22678
Sorry for closing the conversation mistakenly @dongjoon-hyun . I will
update the documentation soon.
---
-
To unsubscribe, e
Github user LantaoJin commented on a diff in the pull request:
https://github.com/apache/spark/pull/22678#discussion_r223966833
--- Diff: dev/run-tests-jenkins.py ---
@@ -176,7 +177,8 @@ def main():
build_display_name = os.environ["BUILD_DISPLAY_NAME"]
GitHub user LantaoJin opened a pull request:
https://github.com/apache/spark/pull/22678
[SPARK-25685][BUILD] Allow regression testing in enterprise Jenkins
## What changes were proposed in this pull request?
Add some environment variables to allow regression testing in
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22595
@srowen The checkbox is what I add in this PR to display/hidden the columns
which have been hidden always. These columns are on heap memory, off heap
memory. If we want to display them in
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22595
cc @dongjoon-hyun @srowen
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22595
If this PR could be merged, #22578 could be added as an additional column
as well.
---
-
To unsubscribe, e-mail: reviews
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22595
Gently ping @jerryshao @cloud-fan . Do you have a chance to review?
---
-
To unsubscribe, e-mail: reviews-unsubscr
GitHub user LantaoJin opened a pull request:
https://github.com/apache/spark/pull/22595
[SPARK-25577][Web UI] Add an on-off switch to display the executor
additional columns
## What changes were proposed in this pull request?
[SPARK-17019](https://issues.apache.org/jira
GitHub user LantaoJin opened a pull request:
https://github.com/apache/spark/pull/22578
[SPARK-25564][CORE] Add output bytes metrics for each Executor
## What changes were proposed in this pull request?
LiveExecutor only statistics the total input bytes. And total output
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22411
Gently ping @cloud-fan
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22411
@cloud-fan I refactor and remove the function outputPath in
```DataWritingCommand```. Besides the unit test you could see, in my local, I
added below test in ```HiveQuerySuite.scala
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22411
Using pattern matching will face a problem.
```InsertIntoHiveDirCommand```,```CreateHiveTableAsSelectCommand``` and
```InsertIntoHiveTable``` are all in spark-hive module. SparkPlanInfo could not
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22411
Agree that. Since this field is important to us. Could I refactor it
following your advice and file a discussion in another Jira
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22411
Most of the information we wanted could be analyzed out from event log
except some metrics in Executor side which doesn't heartbeat to Driver, e.g RPC
count with NameNode. Another case is #
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22411
If almost implementations need to add to case statment, partten matching
each implementations seems weird and easy to causes missing when adds a new
implementation in future
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22411
Isn't common? I am afraid not only one InsertIntoHadoopFsRelation need to
added in case statment.
---
-
To unsubscri
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22411
Gently ping @dongjoon-hyun @cloud-fan
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user LantaoJin commented on a diff in the pull request:
https://github.com/apache/spark/pull/22411#discussion_r217584439
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanInfo.scala ---
@@ -18,6 +18,7 @@
package org.apache.spark.sql.execution
Github user LantaoJin commented on a diff in the pull request:
https://github.com/apache/spark/pull/22411#discussion_r217584359
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
---
@@ -440,7 +440,7 @@ case class DataSource
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22411
Gently ping @cloud-fan @dongjoon-hyun , would you please help to review?
---
-
To unsubscribe, e-mail: reviews-unsubscr
GitHub user LantaoJin opened a pull request:
https://github.com/apache/spark/pull/22411
[SPARK-25421][SQL] Abstract an output path field in trait DataWritingCommand
## What changes were proposed in this pull request?
#22353 import a metadata field in ```SparkPlanInfo``` and
Github user LantaoJin commented on a diff in the pull request:
https://github.com/apache/spark/pull/22353#discussion_r217229063
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanInfo.scala ---
@@ -59,6 +57,12 @@ private[execution] object SparkPlanInfo
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22353
ping @cloud-fan
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22353
Thank you @cloud-fan for your reminding. Weâve handled the drop message
case. Agree, I will update a commit tomorrow
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22353
Spark driver log is always distributed on various client nodes and depends
on the log4j configs. In a big company, it's hard to collect them all and I
think it's better to used for
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22353
> Although event log is in JSON format, it's mostly for internal usage, to
be load by history server and used to build the Spark UI.
AFAIK, there are more and more projects replay even
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22353
The purpose is logging meta info like file input file path to event log. So
I revert the changes about simpleString and add back the metadata to
SparkPlanInfo interface. This change will log
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22353
Thanks @dongjoon-hyun . That would be a problem. Seems setting to 200 or
500 are cause a limited regression on hover text.
Hard code to 500 shows:
https://user
Github user LantaoJin commented on a diff in the pull request:
https://github.com/apache/spark/pull/22353#discussion_r216122293
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
---
@@ -54,7 +54,7 @@ trait DataSourceScanExec extends
Github user LantaoJin commented on a diff in the pull request:
https://github.com/apache/spark/pull/22353#discussion_r216122273
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
---
@@ -54,7 +54,7 @@ trait DataSourceScanExec extends
Github user LantaoJin commented on a diff in the pull request:
https://github.com/apache/spark/pull/22353#discussion_r216121128
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
---
@@ -54,7 +54,7 @@ trait DataSourceScanExec extends
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22353
A scenario here is after an application completed, there is no way to know
the intact file path of File Scan Exec if the path width is longer than 100
chars
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22353
@wangyum @hvanhovell
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user LantaoJin closed the pull request at:
https://github.com/apache/spark/pull/20876
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22353
@cloud-fan @gatorsmile @dongjoon-hyun , kindly help to review.
---
-
To unsubscribe, e-mail: reviews-unsubscr
GitHub user LantaoJin opened a pull request:
https://github.com/apache/spark/pull/22353
[SPARK-25357][SQL] Abbreviated metadata in DataSourceScanExec results in
incomplete information in event log
## What changes were proposed in this pull request?
Field metadata removed
Github user LantaoJin closed the pull request at:
https://github.com/apache/spark/pull/22077
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22077
Thanks @wangyum for triggerring test again and again. Now all tests passed,
cc @cloud-fan @gatorsmile @jerryshao
---
-
To
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22077
retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22077
Seems the fails are not related to here.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22077
ok to test
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22066
Thank you @yucai . New PR #22077 for branch-2.3. Cc: @cloud-fan @jerryshao
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22077
Thanks @yucai , please review this.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
GitHub user LantaoJin opened a pull request:
https://github.com/apache/spark/pull/22077
[SPARK-25084][SQL] "distribute by" on multiple columns (wrap in brackâ¦
â¦ets) may lead to codegen issue (branch-2.3)
## What changes were proposed in this pu
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22067
Seems #22066 has changed the implementation with a similar approach. I
will close this one.
---
-
To unsubscribe, e-mail
Github user LantaoJin closed the pull request at:
https://github.com/apache/spark/pull/22067
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22066
Since you refactor your code copying from #22067 . Would you mind just use
that?
---
-
To unsubscribe, e-mail: reviews
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22067
Add unit test with a rand() column in 'distribute by'
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apac
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22067
@jerryshao Could you help to trigger test build please?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22067
ok to test.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22067
@cloud-fan @jerryshao
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22066
I offer other fix way. #22067
It doesn't need "input" as a global variable (If distribute by random)
---
-
GitHub user LantaoJin opened a pull request:
https://github.com/apache/spark/pull/22067
[SPARK-25084][SQL] distribute by on multiple columns may lead to codeâ¦
â¦gen issue
## What changes were proposed in this pull request?
"distribute by" on multiple c
Github user LantaoJin closed the pull request at:
https://github.com/apache/spark/pull/22034
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22034
Thanks @jerryshao. Close it
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user LantaoJin commented on a diff in the pull request:
https://github.com/apache/spark/pull/22034#discussion_r208784817
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -169,6 +171,19 @@ private[spark] class Executor
GitHub user LantaoJin opened a pull request:
https://github.com/apache/spark/pull/22034
[SPARK-25054][CORE] Enable MetricsServlet sink for Executor
## What changes were proposed in this pull request?
The MetricsServlet sink is added by default as a sink in the master. But
Github user LantaoJin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21734#discussion_r203584220
--- Diff:
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala
---
@@ -193,8 +193,7 @@ object
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/20876
Could anyone else attend to review this? Or should it be closed?
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/20574
Hi @abellina, I think all the tabs were configured this way couldn't go
through in community. Even opening an interface to add customized tabs, @srowen
thinks it isn't worth
Github user LantaoJin closed the pull request at:
https://github.com/apache/spark/pull/21396
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user LantaoJin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21396#discussion_r190113497
--- Diff:
core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala
---
@@ -85,7 +85,10 @@ private[spark] class
Github user LantaoJin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21396#discussion_r190112872
--- Diff:
core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala
---
@@ -85,7 +85,10 @@ private[spark] class
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/21396
In our current settings, when we onboard a new cluster, the default is
connect to DB directly, it's much simpler than access metastore. And we are
going to update to access metastore by de
Github user LantaoJin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21396#discussion_r190110376
--- Diff:
core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala
---
@@ -85,7 +85,10 @@ private[spark] class
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/21396
Also, why still needs #20784 or #21343 to extends to #17335 may be caused
by:
1. Some DDL operation in local mode is much faster than launching a AM in
yarn.
2. Nodes in YARN cluster have
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/21396
@jerryshao
Simply speaking, in a security environment, if we use JDBC to connect to
mysql directly instead of accessing hive metastore, current implementation
blocks job execution
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/21396
[#20784](https://github.com/apache/spark/pull/20784) and
[#21343](https://github.com/apache/spark/pull/21343) did the same thing, but
#21343 is much readable. They are all to fix the problem
Github user LantaoJin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21396#discussion_r190100259
--- Diff:
core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala
---
@@ -85,7 +85,10 @@ private[spark] class
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/21396
Hi @vanzin @jerryshao , could you help to review this?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
GitHub user LantaoJin opened a pull request:
https://github.com/apache/spark/pull/21396
[SPARK-24349][SQL] Ignore setting token if using JDBC
## What changes were proposed in this pull request?
In [SPARK-23639](https://issues.apache.org/jira/browse/SPARK-23639), use
Github user LantaoJin closed the pull request at:
https://github.com/apache/spark/pull/21343
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/21343
Cool, SPARK-23639 also works for me. Close.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/21343
@vanzin Seems duplicated. Let me check.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/21343
@gatorsmile @cloud-fan Could you help to review this?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/21343
Now the test case succeeds.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/21343
Sorry, the test case still failed. Will change it soon
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/21343
@jerryshao @vanzin
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
GitHub user LantaoJin opened a pull request:
https://github.com/apache/spark/pull/21343
[SPARK-24292][SQL] Proxy user cannot connect to HiveMetastore in locaâ¦
â¦l mode
## What changes were proposed in this pull request?
[#17335](https://github.com/apache/spark
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/19293
Fixed in [#19795](https://github.com/apache/spark/pull/19795), close this.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user LantaoJin closed the pull request at:
https://github.com/apache/spark/pull/19293
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/20876
Hi, @jerryshao @cloud-fan, may I have some update?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/20873
Close it while found a work around way.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user LantaoJin closed the pull request at:
https://github.com/apache/spark/pull/20873
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user LantaoJin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20873#discussion_r176647523
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -776,6 +776,9 @@ object SparkSubmit extends CommandLineUtils with
Logging
Github user LantaoJin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20873#discussion_r176647022
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -776,6 +776,9 @@ object SparkSubmit extends CommandLineUtils with
Logging
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/20876
In https://github.com/apache/spark/pull/20803, the implementation is to
bind sql text to DF. That's not good and will introduce many unexpected issues.
I open this PR with new implement
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/20803
Hi @jerryshao @cloud-fan @dongjoon-hyun, I would like to close this PR and
open another one https://github.com/apache/spark/pull/20876, would you please
move to that
Github user LantaoJin closed the pull request at:
https://github.com/apache/spark/pull/20803
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
GitHub user LantaoJin opened a pull request:
https://github.com/apache/spark/pull/20876
[SPARK-23653][SQL] Capture sql statements user input and show them inâ¦
⦠SQL UI
## What changes were proposed in this pull request?
[SPARK-4871](https://issues.apache.org
Github user LantaoJin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20803#discussion_r176320495
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
---
@@ -635,7 +637,8 @@ class SparkSession private(
* @since 2.0.0
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/20803
![screen shot 2018-03-21 at 23 22
07](https://user-images.githubusercontent.com/1853780/37718931-ceb341c6-2d5e-11e8-8f41-4f53a7d83d99.png
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/20803
I have decoupled the sqlText with sql execution. In current implementation,
when user invoke spark.sql(xx), it will create a new
SparkListenerSQLTextCaptured event to listenerbus. Then in
Github user LantaoJin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20803#discussion_r176102381
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -166,20 +168,28 @@ private[sql] object Dataset {
class Dataset[T] private
Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/20873
@rxin @jerryshao @vanzin
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
GitHub user LantaoJin opened a pull request:
https://github.com/apache/spark/pull/20873
[SPARK-22744][CORE] Cannot get the submit hostname of application
## What changes were proposed in this pull request?
In MapReduce, we can get the submit hostname via checking the value of
1 - 100 of 164 matches
Mail list logo