Re: [Java][Arrow IPC] Extreme memory usage when reading feather files

Larry White Mon, 30 Jan 2023 07:19:43 -0800

If you bring the data into the java memory space, you will use a lot of
memory even just for one file:  8 bytes * 200,000 rows * 2000 columns is
3.2 GB, even without the overhead of converting the values to Double
objects (which could double the required memory).  The best approach would
be to leave the data off-heap and access the values using
DataHolders, which should let you access the values using one object per
vector.


On Mon, Jan 30, 2023 at 10:08 AM Chris Nuernberger <[email protected]>
wrote:

> TMD <https://github.com/techascent/tech.ml.dataset> supports memory
> mapped arrow files.  We don't currently support float8 but I would be
> interested in implementing that if you are interested in trying it out.
> Its Clojure, not java, but is still on the JVM.
>
> This is likely to be your fastest option both in terms of raw performance
> and time to final solution.
>

Re: [Java][Arrow IPC] Extreme memory usage when reading feather files

Reply via email to