dbtsai commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses
Serialization Performance
URL: https://github.com/apache/spark/pull/26085#issuecomment-545168597
Agree, this PR already speeds up the serialization a bit, and unblocks our
use-case. I was initially thinking to
dbtsai commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses
Serialization Performance
URL: https://github.com/apache/spark/pull/26085#issuecomment-544750761
@tgravescs The following the result ran on my desktop. LZ4 is 5x faster but
creates 1.6x bigger data. Wondering
dbtsai commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses
Serialization Performance
URL: https://github.com/apache/spark/pull/26085#issuecomment-544611042
@tgravescs let me try lz4 quickly, and will post the result. Thanks.
dbtsai commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses
Serialization Performance
URL: https://github.com/apache/spark/pull/26085#issuecomment-543981704
@dongjoon-hyun thanks. Merged into my branch.
dbtsai commented on issue #26085: [SPARK-29434] [Core] Improve the MapStatuses
Serialization Performance
URL: https://github.com/apache/spark/pull/26085#issuecomment-541256017
@tgravescs it's record / ms. When the num of blocks are large, two steps and
one step will have similar result,
dbtsai commented on issue #26085: [SPARK-29434] [Core] Improve the MapStatuses
Serialization Performance
URL: https://github.com/apache/spark/pull/26085#issuecomment-540829740
ping @dongjoon-hyun @holdenk @viirya
This is an