Wenbo Hu 于2024年3月20日周三 22:03写道:
>
> Hi David,
>
> I've been working on xDBC with Arrow for a while. I have some thoughts on
> ODBC.
>
> We connect to the DBMS in Arrow stream using Python through four
> different methods: JDBC, ADBC, ODBC, and the Python DB client
. Have there been any
> discussions of ADBC having a similar system-wide driver registration paradigm
> like ODBC does?
--
-----
Best Regards,
Wenbo Hu,
in
> the ticket if you are interested. In the meantime, the only workaround I
> can think of is probably to slow down the data source enough that the queue
> doesn't fill up.
>
> [1] https://github.com/apache/arrow/issues/36951
>
>
> On Sun, Jul 30, 2023 at 8:15 PM Wenb
om_batches(schema,
rb_generator(64, 32768, 100))
local_fs = pa.fs.LocalFileSystem()
pa.dataset.write_dataset(
reader,
"/tmp/data_f",
format="feather",
partitioning=["bucket"],
filesystem=local_fs,
existing_data_be
_variable that has thousands of waiters and is constantly doing a
> notify_all.
>
> I think we will need to figure out some kind of reproducible test case. I
> will try and find some time to run some experiments on Monday. Maybe I can
> reproduce this by setting the backpre
ugh then you should eventually get it to a point where your
> supply of data is slower then your writer and I wouldn't expect memory to
> accumulate. These things are solutions but might give us more clues into
> what is happening.
>
> [1]
> https://unix.stackexchange.c
skSchedulerImpl.
--
-
Best Regards,
Wenbo Hu,
xt(); }); }};
ac::Declaration source{"record_batch_source", std::move(rb_source_options)};
```
Works as expected.
Wenbo Hu 于2023年7月26日周三 10:22写道:
>
> Hi,
> I'll open a issue on the DeclareToReader problem.
> I think the key problem is that the input stream is unordere
fulness to data
> that is already in memory (the in-memory sources do set the batch index).
>
> I think your understanding of the concept is correct however. Can you
> share a sample plan that is not working for you? If you use
> DeclarationToTable do you get consistently ordere
cannot follow the input batch order?
Then how the substrait works in this scenario? Does it output
disorderly as well?
Wenbo Hu 于2023年7月25日周二 19:12写道:
>
> Hi,
> I'm trying to zip two streams with same order but different processes.
> For example, the original stream come
ail during the execution
(bucket_id and classify are not specify any ordering). Then How can I
make the acero produce a stream that keep the order as the original
input?
--
-
Best Regards,
Wenbo Hu,
e way to go.
>
> [1]
> https://arrow.apache.org/docs/format/Columnar.html#buffer-listing-for-each-layout
> [2] https://github.com/apache/arrow/issues/36123
>
>
> On Mon, Jul 17, 2023 at 4:44 PM Wenbo Hu wrote:
>
> > Hi,
> > I'm using Acero as the strea
and the output is `arrow::fixed_size_binary(32)`, then
how can I directly write to the out buffers and what is the actual
type should I get from `GetValues`?
Maybe, `auto *out_values =
out->array_span_mutable()->GetValues(uint8_t *>(1);` and
`memcpy(*out_values++, some_ptr, 32);`?
--
---
Sorry, my bad. It works over time.
It seems that the grpc starts with a default window size, then update
to a stable value according to options.
Wenbo Hu 于2023年7月7日周五 23:32写道:
>
> Both my server and client are implemented in python now, Java Client
> may be in the future.
> Back pr
s to tweak this, IIRC)
>
> On Thu, Jul 6, 2023, at 23:18, Wenbo Hu wrote:
> > Hi,
> > I'm using arrow flight to transfer data in distributed system, but
> > the lightning speed makes both client and server faces out of memory
> > issue.
> > For do_put
t for client
download data, or is there any better way to implement that?
--
-
Best Regards,
Wenbo Hu,
; an exported record batch, destroying the Python RecordBatch calls the
> record batch's release callback.
>
> Regards
>
> Antoine.
>
>
>
>
>
>
> Le 29/06/2023 à 15:05, Wenbo Hu a écrit :
> > Thanks for your explanation, Antoine.
> >
> > I fi
onger lives
allocator (as long as the consumer/callback), code works as expected.
Antoine Pitrou 于2023年6月29日周四 17:55写道:
>
>
> Le 29/06/2023 à 09:50, Wenbo Hu a écrit :
> > Hi,
> >
> > I'm using Jpype to pass streams between java and python back and forth.
> &
org.apache.arrow.c.Data.exportArrayStream(allocator, r, s)
with pa.RecordBatchReader._import_from_c(c_stream_ptr) as stream:
for rb in stream: # type: pa.RecordBatch
writer.write(rb)
del rb
del writer
```
Wenbo Hu
referenced by downstream users.
Is yielding a weakref-ed `rb` a good idea? Will the weakref-ed
RecordBatchReader works with other pyarrow api (dataset)?
--
-----
Best Regards,
Wenbo Hu,
#x27; in ScannerOption of dataset may need
to have a dedicated implementation rather than directly call compute
to filter. Furthermore, Acero may also benefit from this feature for
scansink.
Or any other ideas for this situation?
--
-
Best Regards,
Wenbo Hu,
steps making immediate ArrowRecordBatch
unnecessarily? (ArrowBuf -> VectorSchemaRoot@UpstreamReader ->
ArrowBuf@Loader ->VectorSchemaRoot@DownstreamWriter -> ArrowBuf)
Maybe it relates to the allocator, is it any better implementations on
same allocator?
--
-----
Best Regards,
Wenbo Hu,
that mtls is supported in C++/Python
(https://github.com/apache/arrow/search?q=mtls). Is there any plan to
expose mtls to Java Implementation?
--
-
Best Regards,
Wenbo Hu,
e/arrow/pull/10906
> > [2] https://github.com/apache/arrow/pull/11507
> > [3] https://issues.apache.org/jira/browse/ARROW-7744
--
-
Best Regards,
Wenbo Hu,
it seems you want to mostly reuse the
> implementation, but fake certain parts of gRPC since you're doing your own
> in-process proxying/translation? If so the implementation would be different
> than what we would do for a truly new transport. Also, it would effectively
> mean expos
Hi all,
I've just post an issue [ARROW-13889] on jira as below. Maybe here is the right
place to discuss.
I'm trying to implement Flight RPC on RPC framework with protobuf message
support in distributed system.
However, the flight rpc is tied to grpc.
Classes from grpc used in flight serve
26 matches
Mail list logo