sunchao commented on pull request #31998:
URL: https://github.com/apache/spark/pull/31998#issuecomment-853252128
@lxian Yes. I opened #32753 to demonstrate the idea. It's about 1K LOC but
mostly because the same code has to be duplicated in several places. This is an
existing issue in the
sunchao commented on pull request #31998:
URL: https://github.com/apache/spark/pull/31998#issuecomment-850570200
@lxian I'm thinking that the extra cost is just incrementing two indexes at
the same time, so it should be fairly cheap. You can also refer to how
sunchao commented on pull request #31998:
URL: https://github.com/apache/spark/pull/31998#issuecomment-850040241
@lxian In the current approach we'd have to copy values from one vector to
another. I think a better and more efficient approach may be to feed the row
indexes to
sunchao commented on pull request #31998:
URL: https://github.com/apache/spark/pull/31998#issuecomment-848390519
@lxian does it mean that, without this PR, vectorized Parquet reader may
return incorrect results?
--
This is an automated message from the Apache Git Service.
To respond to