Also it appears that allocate new fails to set the value count for BaseVariableWidthVectors. And if you set the value count after you have assigned data then it clears *only* the offset buffer but not the validity or the data buffers.
On Sun, Jul 26, 2020 at 9:01 AM Chris Nuernberger <[email protected]> wrote: > It appears that those methods do not allocate the validity buffer *and* > the function `allocateValidityBuffer` is private. > > How do you recommend allocating the validity buffer? > > On Sun, Jul 26, 2020 at 6:48 AM Chris Nuernberger <[email protected]> > wrote: > >> Perfect, thank you. I tried setCapacity and setValueCount together and >> this didn't have the result I was hoping for. The methods you provide are >> what I was looking for. >> >> On Sat, Jul 25, 2020 at 5:22 PM Jacques Nadeau <[email protected]> >> wrote: >> >>> You can allocate exactly for both fixed [1] and variable types [2]. >>> >>> 1: >>> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/BaseFixedWidthVector.java#L292 >>> 2: >>> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java#L401 >>> >>> You can then use the set method per cell or just grab the memory address >>> (e.g. getDataBufferAddress()) and use Unsafe to bulk copy. The latter >>> obviously is more advanced and requires you do things like set the >>> validity buffers as well. >>> >>> >>> On Sat, Jul 25, 2020 at 6:02 AM Chris Nuernberger <[email protected]> >>> wrote: >>> >>>> Hey, >>>> >>>> I would like to have bulk methods for copying data into a vector. >>>> Specifically, I have an existing data table so I know the desired lengths >>>> of the columns. I can also precalculate the necessary buffer sizes for any >>>> variable sized column. >>>> >>>> >>>> What I don't see is how to pre-allocate columns of a given size. When >>>> I use setValueCount on a column and then use the set method I get a netty >>>> error. What I was hoping for is some allocation method, especially for >>>> primitive data, that allocates the desired uninitialized memory for the >>>> valide and buffer data and then hands those two buffers back to me so I can >>>> use memcpy and friends as opposed to repeated calls to setSafe. >>>> >>>> >>>> Repeated calls to setSafe are time consuming, not parallelizable, and >>>> unnecessary when I know the data rectangle I would like to transfer into a >>>> record batch. >>>> >>>> >>>> In my case I have the data pre-cut. How would you recommend copying >>>> bulk portions of data (that may be in java arrays or in some more abstract >>>> interface) into a record batch? >>>> >>>> Thanks for any help, >>>> >>>> Chris >>>> >>>
