[GitHub] spark issue #19810: [SPARK-22599][SQL] In-Memory Table Pruning without Extra...

2018-09-19 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19810 When I contributed it back, the community is as looking at something else, so I didn’t spend too much time to convince the people to reviewbut if the interests are raised again now, I am

[GitHub] spark pull request #21757: [SPARK-24797] [SQL] respect spark.sql.hive.conver...

2018-07-13 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/21757#discussion_r202417166 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -254,13 +254,15 @@ class

[GitHub] spark pull request #21757: [SPARK-24797] [SQL] respect spark.sql.hive.conver...

2018-07-13 Thread CodingCat
Github user CodingCat closed the pull request at: https://github.com/apache/spark/pull/21757 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21757: [SPARK-24797] [SQL] respect spark.sql.hive.conver...

2018-07-13 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/21757#discussion_r202414440 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -254,13 +254,15 @@ class

[GitHub] spark pull request #21757: [SQL][SPARK-24797] respect spark.sql.hive.convert...

2018-07-12 Thread CodingCat
GitHub user CodingCat opened a pull request: https://github.com/apache/spark/pull/21757 [SQL][SPARK-24797] respect spark.sql.hive.convertMetastoreOrc/Parquet when build… ## What changes were proposed in this pull request? the current code path ignore the value

[GitHub] spark issue #21757: [SPARK-24797] [SQL] respect spark.sql.hive.convertMetast...

2018-07-12 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/21757 @felixcheung --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20394: [SPARK-23214][SQL] cached data should not carry extra hi...

2018-01-25 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/20394 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20368: [SPARK-23195] [SQL] Keep the Hint of Cached Data

2018-01-23 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/20368#discussion_r163402210 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala --- @@ -63,7 +63,7 @@ case class InMemoryRelation

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2018-01-23 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r163401858 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala --- @@ -60,7 +62,8 @@ case class InMemoryRelation

[GitHub] spark pull request #20259: [SPARK-23066][WEB-UI] Master Page increase master...

2018-01-18 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/20259#discussion_r162423666 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -179,6 +181,7 @@ private[deploy] class Master

[GitHub] spark pull request #20259: [SPARK-23066][WEB-UI] Master Page increase master...

2018-01-16 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/20259#discussion_r161956457 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -179,6 +181,7 @@ private[deploy] class Master

[GitHub] spark pull request #20259: [SPARK-23066][WEB-UI] Master Page increase master...

2018-01-16 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/20259#discussion_r161866373 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -179,6 +181,7 @@ private[deploy] class Master

[GitHub] spark pull request #20259: [SPARK-23066][WEB-UI] Master Page increase master...

2018-01-16 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/20259#discussion_r161865952 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -179,6 +181,7 @@ private[deploy] class Master

[GitHub] spark issue #20072: [SPARK-22790][SQL] add a configurable factor to describe...

2018-01-12 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/20072 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #11994: [SPARK-14151] Expose metrics Source and Sink interface

2018-01-08 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/11994 I see, I didn't recognize that the same registry is used to for source as well. in this case, even we have some way to eliminate MetricsRegistry from the API signature, haven't we still

[GitHub] spark pull request #20072: [SPARK-22790][SQL] add a configurable factor to d...

2018-01-07 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/20072#discussion_r160076999 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -263,6 +263,17 @@ object SQLConf { .booleanConf

[GitHub] spark issue #20072: [SPARK-22790][SQL] add a configurable factor to describe...

2018-01-06 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/20072 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #20072: [SPARK-22790][SQL] add a configurable factor to describe...

2018-01-06 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/20072 @cloud-fan @rxin @wzhfy @felixcheung @gatorsmile thanks the review, the new name of the parameter and test are added

[GitHub] spark issue #11994: [SPARK-14151] Expose metrics Source and Sink interface

2018-01-05 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/11994 @jerryshao I mean we also need to provide BaseReporter trait ```scala trait Sink { protected val reporter: BaseReporter = createReporter() def createReporter

[GitHub] spark pull request #20072: [SPARK-22790][SQL] add a configurable factor to d...

2018-01-03 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/20072#discussion_r159474580 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala --- @@ -82,7 +82,11 @@ case class HadoopFsRelation

[GitHub] spark pull request #20072: [SPARK-22790][SQL] add a configurable factor to d...

2018-01-03 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/20072#discussion_r159474598 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -261,6 +261,17 @@ object SQLConf { .booleanConf

[GitHub] spark issue #20072: [SPARK-22790][SQL] add a configurable factor to describe...

2018-01-01 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/20072 @wzhfy thanks for the review, please take a look --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #20072: [SPARK-22790][SQL] add a configurable factor to d...

2018-01-01 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/20072#discussion_r159171970 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -261,6 +261,17 @@ object SQLConf { .booleanConf

[GitHub] spark pull request #20072: [SPARK-22790][SQL] add a configurable factor to d...

2017-12-30 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/20072#discussion_r159133859 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -261,6 +261,17 @@ object SQLConf { .booleanConf

[GitHub] spark issue #20072: [SPARK-22790][SQL] add a configurable factor to describe...

2017-12-29 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/20072 @gatorsmile more comments? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #11994: [SPARK-14151] Expose metrics Source and Sink interface

2017-12-25 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/11994 if I understand correctly, the only issue here is that we exposed codehale's MetricsRegistry in Sink base class..https://github.com/apache/spark/pull/11994/files#diff

[GitHub] spark issue #20072: [SPARK-22790][SQL] add a configurable factor to describe...

2017-12-25 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/20072 @gatorsmile thanks for the review, Happy Christmas! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #20072: [SPARK-22790][SQL] add a configurable factor to d...

2017-12-25 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/20072#discussion_r158651308 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala --- @@ -82,7 +84,15 @@ case class HadoopFsRelation

[GitHub] spark pull request #20072: [SPARK-22790][SQL] add a configurable factor to d...

2017-12-24 Thread CodingCat
GitHub user CodingCat opened a pull request: https://github.com/apache/spark/pull/20072 [SPARK-22790][SQL] add a configurable factor to describe HadoopFsRelation's size ## What changes were proposed in this pull request? as per discussion in https://github.com/apache

[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...

2017-12-19 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19864 thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-18 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r157653784 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala --- @@ -479,4 +485,43 @@ class

[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...

2017-12-17 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19864 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-17 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r157381615 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala --- @@ -479,4 +485,35 @@ class

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-17 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r157381595 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala --- @@ -479,4 +485,35 @@ class

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-17 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r157381594 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala --- @@ -479,4 +485,35 @@ class

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-17 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r157381589 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala --- @@ -71,9 +74,8 @@ case class InMemoryRelation

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-14 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r157118091 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala --- @@ -80,6 +80,14 @@ class CacheManager extends Logging

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-13 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r156718763 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala --- @@ -80,6 +80,14 @@ class CacheManager extends Logging

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-13 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r156716896 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala --- @@ -71,9 +74,10 @@ case class InMemoryRelation

[GitHub] spark pull request #19926: [SPARK-22733] Split StreamExecution into MicroBat...

2017-12-12 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19926#discussion_r156527867 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -71,27 +68,29 @@ class StreamExecution

[GitHub] spark pull request #19926: [SPARK-22733] Split StreamExecution into MicroBat...

2017-12-12 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19926#discussion_r156527709 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -447,296 +384,6 @@ class StreamExecution

[GitHub] spark pull request #19926: [SPARK-22733] Split StreamExecution into MicroBat...

2017-12-12 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19926#discussion_r156527409 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala --- @@ -237,7 +237,7 @@ class StreamingQueryManager private

[GitHub] spark pull request #19926: [SPARK-22733] Split StreamExecution into MicroBat...

2017-12-12 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19926#discussion_r156458057 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala --- @@ -237,7 +237,7 @@ class StreamingQueryManager private

[GitHub] spark pull request #19926: [SPARK-22733] Split StreamExecution into MicroBat...

2017-12-12 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19926#discussion_r156451593 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -71,27 +68,29 @@ class StreamExecution

[GitHub] spark pull request #19926: [SPARK-22733] Split StreamExecution into MicroBat...

2017-12-12 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19926#discussion_r156461586 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -447,296 +384,6 @@ class StreamExecution

[GitHub] spark pull request #19926: [SPARK-22733] Split StreamExecution into MicroBat...

2017-12-12 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19926#discussion_r156451756 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -71,27 +68,29 @@ class StreamExecution

[GitHub] spark pull request #19926: [SPARK-22733] Split StreamExecution into MicroBat...

2017-12-12 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19926#discussion_r156457754 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -783,29 +430,29 @@ class StreamExecution

[GitHub] spark pull request #19926: [SPARK-22733] Split StreamExecution into MicroBat...

2017-12-12 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19926#discussion_r156458765 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala --- @@ -285,12 +285,13 @@ trait StreamTest extends QueryTest

[GitHub] spark pull request #19926: [SPARK-22733] Split StreamExecution into MicroBat...

2017-12-12 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19926#discussion_r156459771 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -71,27 +68,29 @@ class StreamExecution

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-12 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r156445522 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala --- @@ -60,7 +62,8 @@ case class InMemoryRelation

[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...

2017-12-11 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19864 @cloud-fan @viirya any more comments about this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-07 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r155587468 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala --- @@ -479,4 +481,32 @@ class

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-07 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r155587431 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala --- @@ -479,4 +481,32 @@ class

[GitHub] spark issue #19810: [SPARK-22599][SQL] In-Memory Table Pruning without Extra...

2017-12-06 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19810 @cloud-fan for this case, if the data has been dumped to disk or some non-local tasks are started, I/O is involved in addition to the overhead to start extra tasks. If all data is in-memory, only

[GitHub] spark pull request #19810: [SPARK-22599][SQL] In-Memory Table Pruning withou...

2017-12-06 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19810#discussion_r155392065 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -193,38 +195,68 @@ case class

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-06 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r155296970 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala --- @@ -37,8 +37,10 @@ object InMemoryRelation

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-06 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r155287758 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala --- @@ -94,14 +94,16 @@ class CacheManager extends Logging

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-05 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r155147478 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala --- @@ -94,14 +94,16 @@ class CacheManager extends Logging

[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...

2017-12-05 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19864 @viirya sure will add --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-05 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r155140125 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala --- @@ -94,14 +94,16 @@ class CacheManager extends Logging

[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...

2017-12-05 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19864 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...

2017-12-05 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19864 thanks @viirya @cloud-fan and @hvanhovell, just addressed the comments and answered the question --- - To unsubscribe, e-mail

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-05 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r15502 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala --- @@ -94,14 +94,16 @@ class CacheManager extends Logging

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-05 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r154968066 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala --- @@ -94,14 +94,16 @@ class CacheManager extends Logging

[GitHub] spark issue #19810: [SPARK-22599][SQL] In-Memory Table Pruning without Extra...

2017-12-04 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19810 @sadikovi thanks for the review, I replied in comments --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #19810: [SPARK-22599][SQL] In-Memory Table Pruning withou...

2017-12-04 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19810#discussion_r154845669 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -193,38 +195,68 @@ case class

[GitHub] spark pull request #19810: [SPARK-22599][SQL] In-Memory Table Pruning withou...

2017-12-04 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19810#discussion_r154844579 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala --- @@ -52,6 +52,68 @@ object InMemoryRelation

[GitHub] spark issue #19810: [SPARK-22599][SQL] In-Memory Table Pruning without Extra...

2017-12-04 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19810 @cloud-fan would you mind continuing the review? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...

2017-12-04 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19864 @viirya any thoughts? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...

2017-12-02 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19864 @viirya yes, we can get more accurate stats later, however, the first stats is also important as it enables the user to pay less for `the first run` which writes cache. The current

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-02 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r154501939 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala --- @@ -94,14 +94,16 @@ class CacheManager extends Logging

[GitHub] spark issue #19824: [SPARK][STREAMING] Invoke onBatchCompletion() only when ...

2017-12-02 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19824 if you just worry about > As I was using the StreamingListenerBatchCompleted to do some metadata checkpointing stuff, which should be done only when the batch succee

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-02 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r154500900 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala --- @@ -71,9 +73,17 @@ case class InMemoryRelation

[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize on-di...

2017-12-01 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19864 @cloud-fan @viirya @gatorsmile @felixcheung @hvanhovell @HyukjinKwon @dongjoon-hyun @liancheng --- - To unsubscribe, e-mail

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-01 Thread CodingCat
GitHub user CodingCat opened a pull request: https://github.com/apache/spark/pull/19864 [SPARK-22673][SQL] InMemoryRelation should utilize on-disk table stats whenever possible ## What changes were proposed in this pull request? The current implementation

[GitHub] spark issue #19824: [SPARK][STREAMING] Invoke onBatchCompletion() only when ...

2017-12-01 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19824 One thing to note is that mute an event is a behavior change, if a user has introduced some customized listener to capture all completed batches and also extract failed job info, he/she will see

[GitHub] spark issue #19824: [SPARK][STREAMING] Invoke onBatchCompletion() only when ...

2017-12-01 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19824 #16542 has guaranteed that the failed batch can be re-executed, and I didn’t check if reverting the change in #16542 plus your new change can guarantee the same thing... Suppose

[GitHub] spark issue #19824: [SPARK][STREAMING] Invoke onBatchCompletion() only when ...

2017-11-30 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19824 `What I want to say is that if a Job is failed, we should consider the Batch as not completed.` isn't #16542 doing the same thing

[GitHub] spark issue #19824: Revert "[SPARK-18905][STREAMING] Fix the issue of removi...

2017-11-29 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19824 did I miss anything? @victor-wong , you are describing what https://github.com/apache/spark/pull/16542 does in your description, but you are reverting it in your changes

[GitHub] spark issue #19810: [SPARK-22599][SQL] In-Memory Table Pruning without Extra...

2017-11-28 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19810 reading less data is a observation from the input metrics in Spark UI which includes both of local/remote read in BlockManagers, and also the overhead in BlockManager layer itself (especially

[GitHub] spark issue #19810: [SPARK-22599][SQL] In-Memory Table Pruning without Extra...

2017-11-27 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19810 Hi, @cloud-fan, this PR is not only for the case where the data size is larger than the memory size, even when all data is in-memory, I observed up to 10-40% speedup because the implementation

[GitHub] spark issue #19810: [SPARK-22599][SQL] In-Memory Table Pruning without Extra...

2017-11-27 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19810 ping @cloud-fan @viirya @gatorsmile @felixcheung @hvanhovell @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #19810: [SPARK-22599][SQL] In-Memory Table Pruning without Extra...

2017-11-24 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19810 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #19810: [SPARK-22599][SQL] In-Memory Table Pruning without Extra...

2017-11-24 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19810 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #19810: [SPARK-22599][SQL] In-Memory Table Pruning without Extra...

2017-11-24 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19810 jenkins, retest it please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #19810: Partition level pruning 2

2017-11-23 Thread CodingCat
GitHub user CodingCat opened a pull request: https://github.com/apache/spark/pull/19810 Partition level pruning 2 ## What changes were proposed in this pull request? In the current implementation of Spark, InMemoryTableExec read all data in a cached table, filter

[GitHub] spark issue #19763: [SPARK-22537][core] Aggregation of map output statistics...

2017-11-15 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/19763 my question is "how many times we have seen this operation of collecting statistics is the bottleneck?" --- - To u

[GitHub] spark pull request #19763: [SPARK-22537][core] Aggregation of map output sta...

2017-11-15 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19763#discussion_r151322438 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -473,16 +477,41 @@ private[spark] class MapOutputTrackerMaster

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-01 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/16578 made a simple test in a single-node spark environment I used a synthetic dataset which is generated as: (that’s 20M) ```scala import spark.implicits._ import

[GitHub] spark pull request #19569: [SPARK-22348][SQL] The table cache providing Colu...

2017-10-24 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/19569#discussion_r146748032 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -201,35 +193,50 @@ case class

[GitHub] spark issue #15590: [SPARK-17949][SQL] A JVM object based aggregate operator

2017-07-07 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/15590 The fallback strategy is actually the "pure external sorter" approach, right? (https://github.com/apache/spark/blob/41439fd52dd263b9f7d92e608f027f193f461777/sql/core/src/main/scala/

[GitHub] spark issue #18410: [SPARK-20971][SS] purge metadata log in FileStreamSource

2017-06-29 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/18410 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18410: [SPARK-20971][SS] purge metadata log in FileStreamSource

2017-06-27 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/18410 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18410: [SPARK-20971][SS] purge metadata log in FileStreamSource

2017-06-26 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/18410 @HyukjinKwon thanks for the pointer, is it fixed now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18410: [SPARK-20971][SS] purge metadata log in FileStreamSource

2017-06-24 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/18410 @zsxwing would you mind taking a look at this PR...what does this pip packaging tests mean? it's flaky test? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #18410: [SPARK-20971][SS] purge metadata log in FileStreamSource

2017-06-24 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/18410 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18410: [SPARK-20971][SS] purge metadata log in FileStreamSource

2017-06-24 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/18410 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18410: [SPARK-20971][SS] purge metadata log in FileStreamSource

2017-06-23 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/18410 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18410: [SPARK-20971][SS] purge metadata log in FileStreamSource

2017-06-23 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/18410 Jenkins, test it please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18410: [SS][SPARK-20971] purge metadata log in FileStrea...

2017-06-23 Thread CodingCat
GitHub user CodingCat opened a pull request: https://github.com/apache/spark/pull/18410 [SS][SPARK-20971] purge metadata log in FileStreamSource ## What changes were proposed in this pull request? Currently, there is no cleanup mechanism for FileStreamSource's metadata log

  1   2   3   4   5   6   7   8   9   >