Liya Fan created ARROW-7213:
-------------------------------

             Summary: [Java] Represent a data element of a vector as a tree of 
ArrowBufPointer
                 Key: ARROW-7213
                 URL: https://issues.apache.org/jira/browse/ARROW-7213
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Java
            Reporter: Liya Fan
            Assignee: Liya Fan


For a fixed/variable width vector, each of its data element can be represented 
as an ArrowBufPointer object, which represents a contiguous memory segment. 
This makes many tasks easier and more efficient (without memory copy): 
calculating hash code, comparing values, etc.

This cannot be achieved for complex vectors, because their values often reside 
in more than one contiguous memory regions. However, it can be seen that the 
contiguous memory regions for each data element forms a tree-like structure, 
whose leaf nodes are the contiguous memory regions. For example, a data element 
for a struct vector forms a tree, whose root corresponds to the struct vector, 
while the child vectors corresponds to the child nodes of the tree root. 

In this issue, we provide a data structure that represents each data element of 
a vector as a tree, whose leaf nodes are ArrowBufPointers, representing 
contiguous memory regions for the data element. 

With this data structure, many tasks also becomes easier and more efficient: 
calculating hash code, comparing vector elements (ordering & equality). In 
addition, we can do something that could not have been done in the past, like 
placing data elements into a hash table/hash set, etc. 




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to