GitHub user xuanyuanking opened a pull request:
https://github.com/apache/spark/pull/18760
[SPARK-21560][Core] Add hold mode for the LiveListenerBus
## What changes were proposed in this pull request?
1. Add config for hold strategy and the idle capacity.
2. Hold the post
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/18654
ping @cloud-fan @HyukjinKwon
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/18654
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/18654
retest this please...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/18654
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/18654#discussion_r127888746
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
---
@@ -0,0 +1,43
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/18654#discussion_r127872988
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
---
@@ -0,0 +1,52
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/18654#discussion_r127865091
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
---
@@ -0,0 +1,52
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/18654
Yep, empty result dir need this meta, otherwise will throw the exception:
```
org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet.
It must be specified manually
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/18654#discussion_r127856720
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
---
@@ -0,0 +1,52
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/18654#discussion_r127856498
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
---
@@ -0,0 +1,52
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/18654
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/18650
Yep, just close this and open #18654
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/18654
@HyukjinKwon Thanks for you comment, as your mentioned in #18650 and
#17395, empty results of parquet can be fixed by leave the first partition, how
about the orc format? The orc format error
GitHub user xuanyuanking opened a pull request:
https://github.com/apache/spark/pull/18654
[SPARK-21435][SQL] Empty files should be skipped while write to file
## What changes were proposed in this pull request?
Add EmptyDirectoryWriteTask for empty task while writing files
Github user xuanyuanking closed the pull request at:
https://github.com/apache/spark/pull/18650
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user xuanyuanking opened a pull request:
https://github.com/apache/spark/pull/18650
[SPARK-21435][SQL] Empty files should be skipped while write to file
## What changes were proposed in this pull request?
Add EmptyDirectoryWriteTask for empty task while writing files
Github user xuanyuanking closed the pull request at:
https://github.com/apache/spark/pull/14957
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/14957
OK, I'll close this and just use it in our internal env, thanks all guys's
suggestion and review work. Next we may try more complex scenario of this.
---
If your project is set up f
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/17702
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/17702
Test failed may because of the env? `process was terminated by signal 9` in
jenkins log.
retest it please
---
If your project is set up for it, you can reply to this email and have your
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/17702#discussion_r122364493
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
---
@@ -389,6 +389,23 @@ case class DataSource
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/17702
@cloud-fan Thanks for your reply!
It's possible to consolidate them but may be not so necessary? I can
consolidate them by replace the logic in `getGlobbedPaths` list bel
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/17702
ping @cloud-fan
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/17702
@gatorsmile @cloud-fan, do we need other performance test?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/17702#discussion_r114275475
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -146,6 +146,11 @@ object SQLConf {
.longConf
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/17702
Thanks for your review. @gatorsmile @cloud-fan
`Can you show us the performance difference?`
No problem, I reproduce our online case offline like below
## Without
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/17702
ping @cloud-fan and @gatorsmile , could you have a look about this ?
Thanks :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/17702
@HyukjinKwon Can you help me to find a appropriate reviewer about this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/17702
cc @zsxwing @tdas, can you review this? Founded the relative code of yours
before. Thanks :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/17702
@marmbrus Can you take a look of this? Thanks :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
GitHub user xuanyuanking opened a pull request:
https://github.com/apache/spark/pull/17702
[SPARK-20408][SQL] Get the glob path in parallel to reduce resolve relation
time
## What changes were proposed in this pull request?
This PR change the work of getting glob path in
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/16578
@mallman Thanks for let me know. I'll try your patch and check #14957 take
over or not.
I also think we need getting feedback from @liancheng , from our last
discussion, liancheng m
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/16350
Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if
Github user xuanyuanking closed the pull request at:
https://github.com/apache/spark/pull/16350
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/16350
Delete the UT and metrics done. :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/16135
@hvanhovell Sure, I open a new BACKPORT-2.0.
There's a little diff in branch-2.0, the ut test of this patch based on the
`HiveCatalogMetrics` which not added in 2.0, so I added the
GitHub user xuanyuanking opened a pull request:
https://github.com/apache/spark/pull/16350
[SPARK-18700][SQL][BACKPORT-2.0] Add StripedLock for each table's relation
in cache
## What changes were proposed in this pull request?
Backport of #16135 to branc
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/16135
cc @rxin thanks for check.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/16135
Thanks for ericl's review!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
en
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/16135#discussion_r91904868
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -209,72 +221,79 @@ private[hive] class
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/16135#discussion_r91904915
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionedTablePerfStatsSuite.scala
---
@@ -352,4 +353,34 @@ class
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/16135#discussion_r91904834
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -53,6 +53,18 @@ private[hive] class HiveMetastoreCatalog
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/16135#discussion_r91879183
--- Diff:
core/src/main/scala/org/apache/spark/metrics/source/StaticSources.scala ---
@@ -105,6 +111,7 @@ object HiveCatalogMetrics extends Source
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/16135#discussion_r91849598
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionedTablePerfStatsSuite.scala
---
@@ -352,4 +353,28 @@ class
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/16135#discussion_r91849563
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -53,6 +56,18 @@ private[hive] class HiveMetastoreCatalog
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/16135#discussion_r91849565
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionedTablePerfStatsSuite.scala
---
@@ -352,4 +353,28 @@ class
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/16135#discussion_r91849561
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -53,6 +56,18 @@ private[hive] class HiveMetastoreCatalog
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/16135#discussion_r91849557
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -53,6 +56,18 @@ private[hive] class HiveMetastoreCatalog
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/16135#discussion_r91849543
--- Diff:
core/src/main/scala/org/apache/spark/metrics/source/StaticSources.scala ---
@@ -97,6 +97,12 @@ object HiveCatalogMetrics extends Source
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/16135#discussion_r91849544
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -33,6 +35,7 @@ import
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/16135
hi @ericl
This commit do the 3 things below, thanks for your check:
1. Delete the unnecessary lock use and simplify the lock operation
2. Add UT test in
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/16135
@ericl Thanks for your review.
> Is it sufficient to lock around the catalog.filterPartitions(Nil)?
Yes, this patch port from 1.6.2 and I missed the diff here. Fixed in next
pa
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/16135
@rxin @liancheng
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
GitHub user xuanyuanking opened a pull request:
https://github.com/apache/spark/pull/16135
SPARK-18700: add ReadWriteLock for each table's relation in cache
## What changes were proposed in this pull request?
As the scenario describe in
[SPARK-18700][
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r85656841
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala
---
@@ -442,6 +443,79 @@ class
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r85656729
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
---
@@ -126,4 +136,52 @@ object
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r84592883
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -661,6 +666,8 @@ private[sql] class SQLConf extends Serializable
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r84592876
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -212,6 +212,11 @@ object SQLConf {
.booleanConf
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r84592888
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
---
@@ -571,6 +571,37 @@ class
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r84592865
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
---
@@ -126,4 +136,52 @@ object
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r84592821
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
---
@@ -126,4 +136,52 @@ object
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r84592816
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
---
@@ -126,4 +136,52 @@ object
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r84592818
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
---
@@ -126,4 +136,52 @@ object
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r84592806
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
---
@@ -97,7 +99,15 @@ object FileSourceStrategy
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r84592805
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
---
@@ -97,7 +99,15 @@ object FileSourceStrategy
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/14957
@liancheng @rxin
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/14957
(Thank you for your comments and help me a lot :) )
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r77934764
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
---
@@ -97,7 +98,16 @@ object FileSourceStrategy
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r77934557
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
---
@@ -97,7 +98,16 @@ object FileSourceStrategy
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r77845789
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
---
@@ -97,7 +98,16 @@ object FileSourceStrategy
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r77844084
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
---
@@ -97,7 +98,16 @@ object FileSourceStrategy
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r77762907
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala ---
@@ -280,6 +280,29 @@ case class StructType(fields: Array
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r77760397
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala ---
@@ -280,6 +280,29 @@ case class StructType(fields: Array
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r77760264
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
---
@@ -571,6 +571,44 @@ class
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/14957#discussion_r77753006
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala ---
@@ -259,8 +259,23 @@ case class StructType(fields: Array
GitHub user xuanyuanking opened a pull request:
https://github.com/apache/spark/pull/14957
[SPARK-4502][SQL]Support parquet nested struct pruning and add relevaâ¦
## What changes were proposed in this pull request?
Like the description in
[SPARK-4502](https
701 - 777 of 777 matches
Mail list logo