Re: Apache Arrow | Graph Algorithms & Data Structures

2023-06-29 Thread Benson Muite
On 6/30/23 04:21, Bechir Ben Daadouch wrote: > Dear Apache Arrow Dev Community, > > My name is Bechir, I am currently working on a project that involves > implementing graph algorithms in Apache Arrow. > > The initial plan was to construct a node structure and a subsequent graph > that would

Re: Apache Arrow | Graph Algorithms & Data Structures

2023-06-29 Thread Weston Pace
Is your use case to operate on a batch of graphs? For example, do you have hundreds or thousands of graphs that you need to run these algorithms on at once? Or is your use case to operate on a single large graph? If it's the single-graph case then how many nodes do you have? If it's one graph

Re: [VOTE][Format] Add Utf8View Arrays to Arrow Format

2023-06-29 Thread Gang Wu
+1 (non-binding) Thanks, Gang On Thu, Jun 29, 2023 at 3:35 AM Benjamin Kietzman wrote: > Hello, > > I'd like to propose adding Utf8View arrays to the arrow format. > Previous discussion in [1], columnar format description in [2], > flatbuffers changes in [3]. > > There are implementations

Apache Arrow | Graph Algorithms & Data Structures

2023-06-29 Thread Bechir Ben Daadouch
Dear Apache Arrow Dev Community, My name is Bechir, I am currently working on a project that involves implementing graph algorithms in Apache Arrow. The initial plan was to construct a node structure and a subsequent graph that would encompass all the nodes. However, I quickly realized that due

Re: [C++] Dealing with third party method that raises exception

2023-06-29 Thread Weston Pace
We do this quite a bit in the Arrow<->Parquet bridge if IIUC. There are macros defined like this: ``` #define BEGIN_PARQUET_CATCH_EXCEPTIONS try { #define END_PARQUET_CATCH_EXCEPTIONS \ }\ catch (const

Re: Question about nested columnar validity

2023-06-29 Thread Weston Pace
>> 2. For StringView and ArrayView, if the parent has `validity = false`. >> If they have `validity = true`, there offset might point to a invalid >> position >I have no idea, but I hope not. Ben Kietzman might want to answer more >precisely here. I think, for view arrays, the offsets

Re: [C++] Dealing with third party method that raises exception

2023-06-29 Thread Li Jin
Thanks Antoine - the examples are useful - I can use the same pattern for now. Thanks for the quick response! On Thu, Jun 29, 2023 at 10:47 AM Antoine Pitrou wrote: > > Hi Li, > > There is not currently, but it would probably be a useful small utility. > If you look for `std::exception` in the

Re: [C++] Dealing with third party method that raises exception

2023-06-29 Thread Antoine Pitrou
Hi Li, There is not currently, but it would probably be a useful small utility. If you look for `std::exception` in the codebase, you'll find that there a couple of places where we turn it into a Status already. Regards Antoine. Le 29/06/2023 à 16:20, Li Jin a écrit : Hi, IIUC, most of

[C++] Dealing with third party method that raises exception

2023-06-29 Thread Li Jin
Hi, IIUC, most of the Arrow C++ code doesn't not use exceptions. My question is are there some Arrow utility / macro that wrap the function/code that might raise an exception and turn that into code that returns an arrow error Status? Thanks! Li

Re: Question about nested columnar validity

2023-06-29 Thread Antoine Pitrou
Le 29/06/2023 à 15:16, wish maple a écrit : Sorry for being misleading. "valid" offset means that: 1. For Binary Like [1] format, and List formats [2], even if the parent has `validity = false`. Their offset should be well-defined. Yes. 2. For StringView and ArrayView, if the parent

Re: detect memory leak between java and python

2023-06-29 Thread Antoine Pitrou
Hi, To answer precisely: 1) The exported record batch will live as long as the Python RecordBatch object is kept alive. If your script keeps the Python RecordBatch object alive until the end, then the exported record batch is kept alive until the end. 2) The rest is standard Python

Question about nested columnar validity

2023-06-29 Thread wish maple
Sorry for being misleading. "valid" offset means that: 1. For Binary Like [1] format, and List formats [2], even if the parent has `validity = false`. Their offset should be well-defined. 2. For StringView and ArrayView, if the parent has `validity = false`. If they have `validity = true`,

Re: detect memory leak between java and python

2023-06-29 Thread Wenbo Hu
Thanks for your explanation, Antoine. I figured out why I'm facing the memory leak and need to call delete explicit. my example code may mislead the situation. The key problem is when I wrap the code of convert java stream to RecordBatchReader, I generate a child allocator from current context

Re: Question about nested columnar validity

2023-06-29 Thread Antoine Pitrou
Le 29/06/2023 à 13:42, wish maple a écrit : Thanks all! So, in general: 1. For our Binary Like [1] format, and List formats [2], if the parent is not valid, the offset should still be valid What do you call a "valid" offset?

RE: Question about nested columnar validity

2023-06-29 Thread wish maple
Thanks all! So, in general: 1. For our Binary Like [1] format, and List formats [2], if the parent is not valid, the offset should still be valid 2. For the StringView ListView [3] types arrow is currently working on, if the parent is not valid, the child might has valid content Am I

Re: Question about nested columnar validity

2023-06-29 Thread Felipe Oliveira Carvalho
Values in the `offsets` Buffer of a ListArray can’t be left undefined because the length of a valid entry before a NULL entry is the offset associated with that NULL entry minus the previous offset. The ListViewArray format I’m working on doesn’t have that restriction because all the information

Re: detect memory leak between java and python

2023-06-29 Thread Antoine Pitrou
Le 29/06/2023 à 09:50, Wenbo Hu a écrit : Hi, I'm using Jpype to pass streams between java and python back and forth. For follow code works fine with its release callback ```python with child_allocator("test-allocator") as allocator: r =

Re: detect memory leak between java and python

2023-06-29 Thread Wenbo Hu
1. For weakref types, cython API raise TypeError. 2. All related references need to explicit delete before the allocator close For following code, works fine. ``` with child_allocator("test-allocator") as allocator: r = some_package.InMemoryArrowReader.create(allocator)

detect memory leak between java and python

2023-06-29 Thread Wenbo Hu
Hi, I'm using Jpype to pass streams between java and python back and forth. For follow code works fine with its release callback ```python with child_allocator("test-allocator") as allocator: r = some_package.InMemoryArrowReader.create(allocator) c_stream =

Re: Question about nested columnar validity

2023-06-29 Thread Antoine Pitrou
Le 29/06/2023 à 06:07, Weston Pace a écrit : When a binary array or a list array element is null the cleanest thing to do is to set the offsets to be the same. So, for example, given a list array with 5 elements, if second item is null, the offsets could be 0, 8, 8, 12, 20, 50. Question