Hi, The builders can't really know the size of the buffers when nested types are involved. The general solution would be an expensive traversal of the entire tree of builders (e.g. struct builder of nested column types like strings) on every append.
I suggest you leverage your domain knowledge of the data coming into the builders to estimate the number of elements you want to append and stop when that number of elements is reached. >From the equations defining max_buffer_size you can get the length: Integer types are very easy: max_buffer_size = length * sizeof(int type). Strings: max_buffer_size = length * max(sizeof(offset_type), avg string size in bytes). Lists: you need to estimate avg. list length and with that the length of the buffers in the child array of values_length := length * avg_list_length. Also make sure you allow length to be > 0 because if a single string is bigger than X MB, you will *have to* violate this max buffer constraint. It can only be a soft constraint in a robust solution. __ Felipe On Thu, Jul 4, 2024 at 3:12 PM Eric Jacobs <[email protected]> wrote: > Hi, > I would like to build a ChunkedArray but I need to limit the maximum > size of each buffer (somewhere in the low MB's). Ending the current > chunk and starting a new one is straightforward, but I'm having some > difficulty detecting when the current buffer(s) are close to getting > full. If I had the Builders I could check the length() as they are going > along, but I'm not sure how I can get access to those as ChunkedArray is > being built via the API. > > The size control doesn't have to be precise in my case; it just needs to > be conservative as a limit (i.e. the builder cannot go over X MB) > > Any advice would be appreciated. > Thanks, > -Eric > > >
