Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/21019
@squito @cloud-fan
How do you think this change ?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/21019
Jenkins, retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/21019
Jenkins, retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/21019
Jenkins, retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
GitHub user jinxing64 opened a pull request:
https://github.com/apache/spark/pull/21019
[SPARK-23948] Trigger mapstage's job listener in submitMissingTasks
## What changes were proposed in this pull request?
SparkContext submitted a map stage from `submitMapStag
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19868
cc @cloud-fan @jerryshao @jiangxb1987 would you take a look at this?
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/20781
@vanzin Thanks for merging.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user jinxing64 closed the pull request at:
https://github.com/apache/spark/pull/20812
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/20812
@jerryshao
Understood, `Ideally different udfs should be packaged in different jars
with different name/version`. True. But we are faced with tons of udf/jars
migrating from other engine. I
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/20812
@jerryshao
Thanks for comment;
Yes, this change is only for `sc.addJar` and the jars will be named with a
prefix when executor `updateDependencies
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20812#discussion_r175010008
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -752,11 +752,10 @@ private[spark] class Executor(
if
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/20812
@jiangxb1987
Thanks a lot for review. I will refine soon !
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/20812
@vanzin @zsxwing @jerryshao
How do you think about this ?
---
-
To unsubscribe, e-mail: reviews-unsubscr
GitHub user jinxing64 opened a pull request:
https://github.com/apache/spark/pull/20812
[SPARK-23669] Executors fetch jars and name the jars with md5 prefix
## What changes were proposed in this pull request?
In our cluster, there are lots of UDF jars, some of them have the
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/20781
@vanzin
Thanks for review~
1. I spent some time but didn't find the reason why same executor is
killed multiple times and I cannot reproduce either.
2. I found that same comp
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/20781
@jerryshao
Thanks again for review.
It does exist in my cluster that same container can be processed multiple
times, which will make `numExecutorsRunning` negative. I think I've
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/20781
Since the change for `YarnAllocator: killExecutor` is easy. Do you think
it's worth to have this defense?
Thanks again for r
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/20781
@jerryshao
Thanks for advice. I spent some time digging to find why multiple `kill`
sent from Driver to AM, but didn't figure out a way to reproduce.
I come to find that
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/20781
@jerryshao Thanks for taking look.
Yes, it does happen. we have jobs which have already finished all the tasks
but still holding 40~100 executors
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/20781
cc @vanzin @tgravescs @cloud-fan @djvulee
Could you please help review this ?
---
-
To unsubscribe, e-mail: reviews
GitHub user jinxing64 opened a pull request:
https://github.com/apache/spark/pull/20781
[SPARK-23637][YARN]Yarn might allocate more resource if a same executor is
killed multiple times.
## What changes were proposed in this pull request?
`YarnAllocator` uses
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/20685
Thanks for merging !
@cloud-fan @squito @zsxwing @Ngone51
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/20685
@cloud-fan @squito
Thanks a lot !
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/20685
@squito @cloud-fan
Thanks you so much for reviewing. I refined accordingly. Please take
another look when you have time
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20685#discussion_r172492938
--- Diff:
core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala
---
@@ -352,6 +352,63 @@ class
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20685#discussion_r172492581
--- Diff:
core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala
---
@@ -352,6 +352,63 @@ class
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/20685
cc @cloud-fan @jiangxb1987
Could you please help take a look.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/20685
Jenkins, retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/20685
![image](https://user-images.githubusercontent.com/4058918/36822880-5f4aa9e8-1d35-11e8-8956-4081a2953d22.png)
The failed test is not relatedï¼ I can pass in my local
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/20685
Jenkins, retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/20685
Jenkins, retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/20685
Jenkins, test this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
GitHub user jinxing64 opened a pull request:
https://github.com/apache/spark/pull/20685
[SPARK-23524] Big local shuffle blocks should not be checked for corruption.
## What changes were proposed in this pull request?
In current code, all local blocks will be checked for
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19330
@xxzzycq
Currently no
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/20461
@cloud-fan thanks a lot for ping. LGTM
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18171
Why this is not merged into 2.2 ?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20069#discussion_r160910495
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -851,7 +851,7 @@ object PushDownPredicate extends
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19330
@maropu @kiszk
In current change, `ordered` is excluded from `toString`,
`buildFormattedString`, `jsonValue`; I prefer to keep `ordered` internal and
used only when ordering.
Actually
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19868
Jenkins, retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
GitHub user jinxing64 opened a pull request:
https://github.com/apache/spark/pull/19868
[SPARK-22676] Avoid iterating all partition paths when
spark.sql.hive.verifyPartitionPath=true
## What changes were proposed in this pull request?
In current code, it will scanning all
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/10572
@HyukjinKwon
To merge small files, should I tune `spark.sql.files.maxPartitionBytes`?
But IIUC it only works for `FileSourceScanExec`. So when I select from hive
table, it doesn't
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19330
Jenkins, retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19560
@wangyum
Make sense.
You can also try approach in this pr.
If there are many(tens of thousands of) ETLs in the warehouse, we cannot
afford to give that many hints or fix all the
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19652
@gatorsmile
(Very gentle ping)
Could you please give some comments when you have time :)
Thanks you so much
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19602
@gatorsmile
Thanks a lot for review this pr :)
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19602
True, I restrict it to be Cast (string_typed_attr as integral types) and
`EqualTo`. `Not(EqualTo)` is not included, since the extra burden put to
metastore is minor
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19652#discussion_r148775355
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
---
@@ -267,6 +268,33 @@ private class
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19652#discussion_r148749862
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -1485,21 +1487,27 @@ class SparkSqlAstBuilder(conf: SQLConf
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19652#discussion_r148749276
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -1485,21 +1487,27 @@ class SparkSqlAstBuilder(conf: SQLConf
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19652#discussion_r148748292
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -1454,22 +1454,24 @@ class SparkSqlAstBuilder(conf: SQLConf
GitHub user jinxing64 opened a pull request:
https://github.com/apache/spark/pull/19652
[SPARK-22435][SQL] Support processing array and map type using script
## What changes were proposed in this pull request?
Currently, It is not supported to use script(e.g. python) to
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19602
@gatorsmile
Thanks again for review this pr.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19602#discussion_r147583510
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala
---
@@ -53,7 +52,7 @@ class HiveClientSuite(version: String
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19602
@gatorsmile
Thanks a lot for your help :)
>Can we just evaluate the right side CAST(2017 as STRING), since it is
foldable?
Do you mean to add a new rule ? -- cast the t
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19602
Could we fix this? Sql like above is common in my warehouse.
---
-
To unsubscribe, e-mail: reviews-unsubscr
GitHub user jinxing64 opened a pull request:
https://github.com/apache/spark/pull/19602
[SPARK-22384][SQL] Refine partition pruning when attribute is wrapped in
Cast
## What changes were proposed in this pull request?
Sql below will get all partitions from metastore, which
GitHub user jinxing64 reopened a pull request:
https://github.com/apache/spark/pull/19573
[SPARK-22350][SQL] select grouping__id from subquery
## What changes were proposed in this pull request?
Currently, sql below will fail:
```
SELECT cnt, k2, k3, grouping__id
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19573
Thanks a lot. I will leave it open(if it's ok). Actually my friend from a
another company also suffers this issue. Maybe people can leave some ideas on
this.
Thanks again for comment on
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19573
@gatorsmile
thanks for reply.
It seems you preffer to give the alias explicitly. I will close this pr and
go by your suggestion.
But in my warehouse, there are lots of ETLs which are
Github user jinxing64 closed the pull request at:
https://github.com/apache/spark/pull/19573
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19573
@DonnyZone
Thanks for taking a look.
I think not quite the same.
After https://github.com/apache/spark/pull/18270, all `grouping__id` are
transformed to be `GroupingID` , which makes
GitHub user jinxing64 opened a pull request:
https://github.com/apache/spark/pull/19573
[SPARK-22350][SQL] select grouping__id from subquery
## What changes were proposed in this pull request?
Currently, sql below will fail:
```
SELECT cnt, k2, k3, grouping__id
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19560
>My main concern is, we'd better not to put burden on Spark to deal with
metastore failures
I think this make sense. I was also thinking about this when proposing this
pr. I
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19560
@wzhfy
Thanks for comment;
I know your point.
In my cluster, namenode is under heavy pressure. Errors in stats happen
with big chance. Users always do not know there's error in
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19560
@viirya
Thanks a lot for comments.
1. In current change, I verify the stats from file system only when the
relation is under join.
2. I added a warning when the size from file system
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19560#discussion_r146449741
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
@@ -120,22 +120,41 @@ class DetermineTableStats(session
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19560
@gatorsmile @dongjoon-hyun
Thanks a lot for looking into this.
This pr aims to avoid OOM if metastore fails to update table properties
after the data is already produced. With the
GitHub user jinxing64 opened a pull request:
https://github.com/apache/spark/pull/19560
[SPARK-22334][SQL] Check table size from HDFS in case the size in metastore
is wrong.
## What changes were proposed in this pull request?
Currently we use table properties('tota
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19476#discussion_r144776222
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
---
@@ -1552,4 +1582,65 @@ private[spark] object BlockManager
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19476#discussion_r144772767
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
---
@@ -1552,4 +1582,65 @@ private[spark] object BlockManager
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19476#discussion_r144770017
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
---
@@ -1552,4 +1582,65 @@ private[spark] object BlockManager
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19476#discussion_r144768355
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
---
@@ -1552,4 +1582,65 @@ private[spark] object BlockManager
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19476
@jerryshao
Thanks a lot for ping. I left comments by my understanding. Not sure if
it's helpful :)
---
-
To unsubs
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19476#discussion_r144577910
--- Diff:
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -355,11 +355,21 @@ package object config {
.doc(&quo
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19476#discussion_r144586111
--- Diff:
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -355,11 +355,21 @@ package object config {
.doc(&quo
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19476#discussion_r144585860
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
---
@@ -1552,4 +1582,65 @@ private[spark] object BlockManager
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19364#discussion_r141781986
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ExchangeCoordinator.scala
---
@@ -232,7 +232,7 @@ class ExchangeCoordinator
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19330
@kiszk
Thanks a lot for comments. Tests passed now. In current change `ordered` is
included in `jsonValue`. But I'm not sure it is appropriate.
Thanks again for taking time lo
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19330
Conflicts resolved.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19330
It seems the failed SparkR unit tests is not related.
In current change, I added `trait OrderSpecified`,
expressions(`BinaryComparison`, `Max`, `Min`, `SortArray`, `SortOrder`) using
ordering
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19330
Jenkins, retest this plesase.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19330
@hvanhovell
Thanks a lot for comment.
I got you point. I will refine soon.
---
-
To unsubscribe, e-mail: reviews
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19330
It seems https://github.com/apache/spark/pull/15970 is not being worked.
I resolved conflicts and add some tests in this pr
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19330#discussion_r140627825
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
---
@@ -663,6 +663,18 @@ class
GitHub user jinxing64 opened a pull request:
https://github.com/apache/spark/pull/19330
Orderable MapType
## What changes were proposed in this pull request?
We can make MapType orderable, and thus usable in aggregates and joins.
## How was this patch tested
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19068
@yaooqinn
This change works well for me, thanks for fix !
After this change, hive client for execution(points to a dummy local
metastore) will never be used when running sql in`spark-sql
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19219
It seems there are still some other places where session state is not
guaranteed to be closed. I will update this pr soon
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19219
cc @cloud-fan @jiangxb1987
Could you please take a look at this ?
---
-
To unsubscribe, e-mail: reviews-unsubscr
GitHub user jinxing64 opened a pull request:
https://github.com/apache/spark/pull/19219
[SPARK-21993][SQL] Close sessionState in shutdown hook.
## What changes were proposed in this pull request?
In current code, `SessionState` in `SparkSQLCLIDriver` is not guaranteed to
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19086
@gatorsmile
OK and thanks a lot for review :)
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/15970
@hvanhovell Are you still working on this? I think this is feature is
useful :)
---
-
To unsubscribe, e-mail: reviews
Github user jinxing64 closed the pull request at:
https://github.com/apache/spark/pull/19086
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19086
@gatorsmile
I feel sorry if this pr breaks rules of current code. But I think the
function is a good(convenient for user) one. In our warehouse, there hundreds
ETLs are using this function
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19086
I'm from Meituan, a Chinese company
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional com
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19086
It's not ok to follow Spark current behavior?(It will be different from
Hive)
I make this pr because we are migrating from Hive to Spark and lots of our
users are using this fun
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19086
@gatorsmile More comments on this ?
Regarding the behavior change, should we follow Spark previous behavior or
follow Hive? I'm ok with
Github user jinxing64 closed the pull request at:
https://github.com/apache/spark/pull/19127
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19127
Sure, I will close this then.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
GitHub user jinxing64 opened a pull request:
https://github.com/apache/spark/pull/19127
[SPARK-21916][SQL] Set isolationOn=true when create hive client for
metadata.
## What changes were proposed in this pull request?
In current code, we set `isolationOn=!isCliSession
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19086
Sure, current behavior is hive behavior.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
101 - 200 of 719 matches
Mail list logo