[GitHub] [spark] dbtsai commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses Serialization Performance

2019-10-22 Thread GitBox
dbtsai commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses Serialization Performance URL: https://github.com/apache/spark/pull/26085#issuecomment-545168597 Agree, this PR already speeds up the serialization a bit, and unblocks our use-case. I was initially thinking to

[GitHub] [spark] dbtsai commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses Serialization Performance

2019-10-21 Thread GitBox
dbtsai commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses Serialization Performance URL: https://github.com/apache/spark/pull/26085#issuecomment-544750761 @tgravescs The following the result ran on my desktop. LZ4 is 5x faster but creates 1.6x bigger data. Wondering

[GitHub] [spark] dbtsai commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses Serialization Performance

2019-10-21 Thread GitBox
dbtsai commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses Serialization Performance URL: https://github.com/apache/spark/pull/26085#issuecomment-544611042 @tgravescs let me try lz4 quickly, and will post the result. Thanks.

[GitHub] [spark] dbtsai commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses Serialization Performance

2019-10-18 Thread GitBox
dbtsai commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses Serialization Performance URL: https://github.com/apache/spark/pull/26085#issuecomment-543981704 @dongjoon-hyun thanks. Merged into my branch.

[GitHub] [spark] dbtsai commented on issue #26085: [SPARK-29434] [Core] Improve the MapStatuses Serialization Performance

2019-10-11 Thread GitBox
dbtsai commented on issue #26085: [SPARK-29434] [Core] Improve the MapStatuses Serialization Performance URL: https://github.com/apache/spark/pull/26085#issuecomment-541256017 @tgravescs it's record / ms. When the num of blocks are large, two steps and one step will have similar result,

[GitHub] [spark] dbtsai commented on issue #26085: [SPARK-29434] [Core] Improve the MapStatuses Serialization Performance

2019-10-10 Thread GitBox
dbtsai commented on issue #26085: [SPARK-29434] [Core] Improve the MapStatuses Serialization Performance URL: https://github.com/apache/spark/pull/26085#issuecomment-540829740 ping @dongjoon-hyun @holdenk @viirya This is an