Hi,
The builders can't really know the size of the buffers when nested types
are involved. The general solution would be an expensive traversal of the
entire tree of builders (e.g. struct builder of nested column types like
strings) on every append.

I suggest you leverage your domain knowledge of the data coming into the
builders to estimate the number of elements you want to append and stop
when that number of elements is reached.

>From the equations defining max_buffer_size you can get the length:

Integer types are very easy: max_buffer_size = length * sizeof(int type).
Strings: max_buffer_size = length * max(sizeof(offset_type), avg string
size in bytes).
Lists: you need to estimate avg. list length and with that the length of
the buffers in the child array of values_length := length * avg_list_length.

Also make sure you allow length to be > 0 because if a single string is
bigger than X MB, you will *have to* violate this max buffer constraint. It
can only be a soft constraint in a robust solution.

__
Felipe

On Thu, Jul 4, 2024 at 3:12 PM Eric Jacobs <[email protected]> wrote:

> Hi,
> I would like to build a ChunkedArray but I need to limit the maximum
> size of each buffer (somewhere in the low MB's). Ending the current
> chunk and starting a new one is straightforward, but I'm having some
> difficulty detecting when the current buffer(s) are close to getting
> full. If I had the Builders I could check the length() as they are going
> along, but I'm not sure how I can get access to those as ChunkedArray is
> being built via the API.
>
> The size control doesn't have to be precise in my case; it just needs to
> be conservative as a limit (i.e. the builder cannot go over X MB)
>
>   Any advice would be appreciated.
> Thanks,
> -Eric
>
>
>

Reply via email to