Re: ADBC - OS-level driver manager

2024-03-20 Thread Wenbo Hu
Wenbo Hu 于2024年3月20日周三 22:03写道: > > Hi David, > > I've been working on xDBC with Arrow for a while. I have some thoughts on > ODBC. > > We connect to the DBMS in Arrow stream using Python through four > different methods: JDBC, ADBC, ODBC, and the Python DB client

Re: ADBC - OS-level driver manager

2024-03-20 Thread Wenbo Hu
. Have there been any > discussions of ADBC having a similar system-wide driver registration paradigm > like ODBC does? -- ----- Best Regards, Wenbo Hu,

Re: dataset write stucks on ThrottledAsyncTaskSchedulerImpl

2023-07-31 Thread Wenbo Hu
in > the ticket if you are interested. In the meantime, the only workaround I > can think of is probably to slow down the data source enough that the queue > doesn't fill up. > > [1] https://github.com/apache/arrow/issues/36951 > > > On Sun, Jul 30, 2023 at 8:15 PM Wenb

Re: dataset write stucks on ThrottledAsyncTaskSchedulerImpl

2023-07-30 Thread Wenbo Hu
om_batches(schema, rb_generator(64, 32768, 100)) local_fs = pa.fs.LocalFileSystem() pa.dataset.write_dataset( reader, "/tmp/data_f", format="feather", partitioning=["bucket"], filesystem=local_fs, existing_data_be

Re: dataset write stucks on ThrottledAsyncTaskSchedulerImpl

2023-07-30 Thread Wenbo Hu
_variable that has thousands of waiters and is constantly doing a > notify_all. > > I think we will need to figure out some kind of reproducible test case. I > will try and find some time to run some experiments on Monday. Maybe I can > reproduce this by setting the backpre

Re: dataset write stucks on ThrottledAsyncTaskSchedulerImpl

2023-07-28 Thread Wenbo Hu
ugh then you should eventually get it to a point where your > supply of data is slower then your writer and I wouldn't expect memory to > accumulate. These things are solutions but might give us more clues into > what is happening. > > [1] > https://unix.stackexchange.c

dataset write stucks on ThrottledAsyncTaskSchedulerImpl

2023-07-27 Thread Wenbo Hu
skSchedulerImpl. -- - Best Regards, Wenbo Hu,

Re: how to make acero output order by batch index

2023-07-25 Thread Wenbo Hu
xt(); }); }}; ac::Declaration source{"record_batch_source", std::move(rb_source_options)}; ``` Works as expected. Wenbo Hu 于2023年7月26日周三 10:22写道: > > Hi, > I'll open a issue on the DeclareToReader problem. > I think the key problem is that the input stream is unordere

Re: how to make acero output order by batch index

2023-07-25 Thread Wenbo Hu
fulness to data > that is already in memory (the in-memory sources do set the batch index). > > I think your understanding of the concept is correct however. Can you > share a sample plan that is not working for you? If you use > DeclarationToTable do you get consistently ordere

Re: how to make acero output order by batch index

2023-07-25 Thread Wenbo Hu
cannot follow the input batch order? Then how the substrait works in this scenario? Does it output disorderly as well? Wenbo Hu 于2023年7月25日周二 19:12写道: > > Hi, > I'm trying to zip two streams with same order but different processes. > For example, the original stream come

how to make acero output order by batch index

2023-07-25 Thread Wenbo Hu
ail during the execution (bucket_id and classify are not specify any ordering). Then How can I make the acero produce a stream that keep the order as the original input? -- - Best Regards, Wenbo Hu,

Re: Need help on ArrayaSpan and writing C++ udf

2023-07-17 Thread Wenbo Hu
e way to go. > > [1] > https://arrow.apache.org/docs/format/Columnar.html#buffer-listing-for-each-layout > [2] https://github.com/apache/arrow/issues/36123 > > > On Mon, Jul 17, 2023 at 4:44 PM Wenbo Hu wrote: > > > Hi, > > I'm using Acero as the strea

Need help on ArrayaSpan and writing C++ udf

2023-07-17 Thread Wenbo Hu
and the output is `arrow::fixed_size_binary(32)`, then how can I directly write to the out buffers and what is the actual type should I get from `GetValues`? Maybe, `auto *out_values = out->array_span_mutable()->GetValues(uint8_t *>(1);` and `memcpy(*out_values++, some_ptr, 32);`? -- ---

Re: Traffic control and cancel detect on flight do_get in python implementation

2023-07-07 Thread Wenbo Hu
Sorry, my bad. It works over time. It seems that the grpc starts with a default window size, then update to a stable value according to options. Wenbo Hu 于2023年7月7日周五 23:32写道: > > Both my server and client are implemented in python now, Java Client > may be in the future. > Back pr

Re: Traffic control and cancel detect on flight do_get in python implementation

2023-07-07 Thread Wenbo Hu
s to tweak this, IIRC) > > On Thu, Jul 6, 2023, at 23:18, Wenbo Hu wrote: > > Hi, > > I'm using arrow flight to transfer data in distributed system, but > > the lightning speed makes both client and server faces out of memory > > issue. > > For do_put

Traffic control and cancel detect on flight do_get in python implementation

2023-07-06 Thread Wenbo Hu
t for client download data, or is there any better way to implement that? -- - Best Regards, Wenbo Hu,

Re: detect memory leak between java and python

2023-06-30 Thread Wenbo Hu
; an exported record batch, destroying the Python RecordBatch calls the > record batch's release callback. > > Regards > > Antoine. > > > > > > > Le 29/06/2023 à 15:05, Wenbo Hu a écrit : > > Thanks for your explanation, Antoine. > > > > I fi

Re: detect memory leak between java and python

2023-06-29 Thread Wenbo Hu
onger lives allocator (as long as the consumer/callback), code works as expected. Antoine Pitrou 于2023年6月29日周四 17:55写道: > > > Le 29/06/2023 à 09:50, Wenbo Hu a écrit : > > Hi, > > > > I'm using Jpype to pass streams between java and python back and forth. > &

Re: detect memory leak between java and python

2023-06-29 Thread Wenbo Hu
org.apache.arrow.c.Data.exportArrayStream(allocator, r, s) with pa.RecordBatchReader._import_from_c(c_stream_ptr) as stream: for rb in stream: # type: pa.RecordBatch writer.write(rb) del rb del writer ``` Wenbo Hu

detect memory leak between java and python

2023-06-29 Thread Wenbo Hu
referenced by downstream users. Is yielding a weakref-ed `rb` a good idea? Will the weakref-ed RecordBatchReader works with other pyarrow api (dataset)? -- ----- Best Regards, Wenbo Hu,

Add limit and offset to ScannerOption

2023-06-02 Thread Wenbo Hu
#x27; in ScannerOption of dataset may need to have a dedicated implementation rather than directly call compute to filter. Furthermore, Acero may also benefit from this feature for scansink. Or any other ideas for this situation? -- - Best Regards, Wenbo Hu,

Best practice on populating from VectorSchemaRoot to VectorSchemaRoot, ArrowStreamReader to ArrowStreamWriter

2023-04-03 Thread Wenbo Hu
steps making immediate ArrowRecordBatch unnecessarily? (ArrowBuf -> VectorSchemaRoot@UpstreamReader -> ArrowBuf@Loader ->VectorSchemaRoot@DownstreamWriter -> ArrowBuf) Maybe it relates to the allocator, is it any better implementations on same allocator? -- ----- Best Regards, Wenbo Hu,

Exposing Mutual TLS to java flight server

2023-03-23 Thread Wenbo Hu
that mtls is supported in C++/Python (https://github.com/apache/arrow/search?q=mtls). Is there any plan to expose mtls to Java Implementation? -- - Best Regards, Wenbo Hu,

Re: [DISCUSS][FLIGHT SQL] Intentions around JDBC and/or ODBC for Flight SQL?

2021-12-14 Thread Wenbo Hu
e/arrow/pull/10906 > > [2] https://github.com/apache/arrow/pull/11507 > > [3] https://issues.apache.org/jira/browse/ARROW-7744 -- - Best Regards, Wenbo Hu,

Re: [C++] Decouple Flight RPC from GRPC

2021-09-03 Thread Wenbo Hu
it seems you want to mostly reuse the > implementation, but fake certain parts of gRPC since you're doing your own > in-process proxying/translation? If so the implementation would be different > than what we would do for a truly new transport. Also, it would effectively > mean expos

[C++] Decouple Flight RPC from GRPC

2021-09-03 Thread Wenbo Hu
Hi all, I've just post an issue [ARROW-13889] on jira as below. Maybe here is the right place to discuss. I'm trying to implement Flight RPC on RPC framework with protobuf message support in distributed system. However, the flight rpc is tied to grpc. Classes from grpc used in flight serve