from:"Yaron Gvili"

hashing Arrow structures

2023-07-21 Thread Yaron Gvili

Hi, What are the recommended ways to hash Arrow structures? What are the pros and cons of each approach? Looking a bit through the code, I've so far found two different hashing approaches, which I describe below. Are there any others? A first approach I found is using `Hashing32` and `Hashing6

Re: [ANNOUNCE] New Arrow PMC member: Will Jones

2023-03-13 Thread Yaron Gvili

Congrats, Will! Yaron. From: Rok Mihevc Sent: Monday, March 13, 2023 9:48 PM To: dev@arrow.apache.org Subject: Re: [ANNOUNCE] New Arrow PMC member: Will Jones Congratulations Will! Rok On Mon, Mar 13, 2023 at 8:37 PM Steph Hazlitt wrote: > Congrats Will! >

testing of back-pressure

2023-02-16 Thread Yaron Gvili

Hi, What testing of back-pressure exist in Acero? I'm mostly interested in testing of back-pressure that applies to any ExecNode, but could also learn from more specific testing. If this is not well covered, I'd look into implementing such testing. Cheers, Yaron.

Re: Build issues (Protobuf internal symbols)

2023-02-13 Thread Yaron Gvili

@Li, my understanding is that if you generate headers from the same Arrow protobuf files using the same toolchain (including protoc) used in the Arrow build, then you will be able to use these headers to correctly access protobuf objects within Arrow structures. This doesn't unhide symbols in th

Re: measuring memory usage of Arrow structures

2022-10-28 Thread Yaron Gvili

arrow/memory_pool_jemalloc.cc#L157 >> [2] >> https://github.com/apache/arrow/blob/master/cpp/src/arrow/memory_pool_test.cc >> >> On Fri, Oct 28, 2022 at 3:10 PM Yaron Gvili wrote: >> >>> Hi, >>> >>> Is there a supported/convenient

measuring memory usage of Arrow structures

2022-10-28 Thread Yaron Gvili

Hi, Is there a supported/convenient way for measuring the memory usage of Arrow structures? For my specific use case, measuring memory usage of either a record batch or an array would be sufficiently convenient. Cheers, Yaron.

Re: archery-lint unknown targets

2022-10-21 Thread Yaron Gvili

toine. Le 21/10/2022 à 17:49, Yaron Gvili a écrit : > Hi, > > I got the errors below from `archery lint --cpplint --clang-format > --clang-tidy` and I'm wondering how to figure them out. > > ninja: error: unknown target 'check-format' > ninja: error: unknown tar

archery-lint unknown targets

2022-10-21 Thread Yaron Gvili

Hi, I got the errors below from `archery lint --cpplint --clang-format --clang-tidy` and I'm wondering how to figure them out. ninja: error: unknown target 'check-format' ninja: error: unknown target 'check-clang-tidy' Note that I am working on a modified local clone of a fork of Arrow, so rep

Re: [Acero] Error handling in ExecNode

2022-10-18 Thread Yaron Gvili

Hi Li, One way I've seen (which hopefully is the right way) is invoking `ExecNode::ErrorIfNotOk(Status)`. If `WriteRecordBatch` returns a `Status` then just pass it; if it returns a `Result` then you can pass its `.status()`. Yaron. From: Li Jin Sent: Tuesday,

Re: Register custom ExecNode factories

2022-09-28 Thread Yaron Gvili

I agree with Weston about dynamically loading a shared object with initialization code for registering node factories. For custom node factories, I think this loading would best be done from a separate Python module, different than "_exec_plan.pyx", that the user would need to import for trigge

Re: unclear compilation errors with util::optional

2022-09-22 Thread Yaron Gvili

therefore removed compatibility backports such as arrow::util::optional. Now you should just use std::optional. So be sure to rebase your work on master and fix any reference to those compatibility backports in your code. Regards Antoine. Le 22/09/2022 à 10:26, Yaron Gvili a écrit : > Hi, > >

unclear compilation errors with util::optional

2022-09-22 Thread Yaron Gvili

Hi, In a PR I'm working on [1], I get compilation errors in CI jobs that I don't see the reason for. I'd appreciate help with this. For example, one job's [2] compilation complains about the util::optional symbol not being declared (this happens in other jobs too). This is unclear for a couple

Re: apparently misleading test assertion printout

2022-09-19 Thread Yaron Gvili

Surely not right away, but we'll see :) Yaron. From: Antoine Pitrou Sent: Monday, September 19, 2022 4:09 AM To: dev@arrow.apache.org Subject: Re: apparently misleading test assertion printout Le 19/09/2022 à 10:05, Yaron Gvili a écrit : > Hi Ant

Re: apparently misleading test assertion printout

2022-09-19 Thread Yaron Gvili

t know how to print out a value. Guidance to fix this at: https://github.com/google/googletest/blob/main/docs/advanced.md#teaching-googletest-how-to-print-your-values Regards Antoine. Le 19/09/2022 à 09:54, Yaron Gvili a écrit : > Hi, > > In my local code, I observed a test assertion

apparently misleading test assertion printout

2022-09-19 Thread Yaron Gvili

Hi, In my local code, I observed a test assertion printout that seems misleading. The printout looks like this: Expected equality of these values: expected_empty_segment Which is: 24-byte object <00-00 00-00 00-00 00-00 02-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00> empty_segment

Re: Integration between Flight and Acero

2022-09-13 Thread Yaron Gvili

ode::StartProducing seems quite a bit of change (also I don't think SourceNode is exposed via public header). But let me know if you think I am missing something. Li On Tue, Sep 6, 2022 at 4:57 AM Yaron Gvili wrote: > Hi Li, > > Here's my 2 cents about the Ibis/Substrait part of t

Re: PyArrow build problem

2022-09-12 Thread Yaron Gvili

t and share the URL? Or you can open a Jira issue and continue this because we can attach files on Jira. Thanks, -- kou In "Re: PyArrow build problem" on Mon, 12 Sep 2022 19:36:51 +, Yaron Gvili wrote: > Hi Kou, > > I'm attaching the cmake log files f

Re: PyArrow build problem

2022-09-12 Thread Yaron Gvili

un, 11 Sep 2022 13:57:53 +, Yaron Gvili wrote: > Hi All, > > I got the error below while running "cd python/ && python setup.py build_ext > --inplace", after successfully building and installing a recent master > version (a63e60bad89b41266d155bc496eb3837

Re: PyArrow build problem

2022-09-11 Thread Yaron Gvili

couldn't get it to build nor find instructions for doing so, and I suspect the build problem I described earlier here would get fixed with this build. Could anyone explain how the PyArrow build works now? Cheers, Yaron. From: Yaron Gvili Sent: Sunday, Septemb

PyArrow build problem

2022-09-11 Thread Yaron Gvili

Hi All, I got the error below while running "cd python/ && python setup.py build_ext --inplace", after successfully building and installing a recent master version (a63e60bad89b41266d155bc496eb383765702492) of Arrow C++ under a pyarrow-dev Conda environment, as in the Python dev doc (https://a

Re: design for ordered aggregation

2022-09-07 Thread Yaron Gvili

simplification / cleaning pass. Probably only benchmarks will really say. On Tue, Sep 6, 2022 at 10:26 AM Yaron Gvili wrote: > > Hi All, > > I'm working on a design for ordered aggregations in Arrow C++ and would like > to get some opinions about it. Ordered aggregation i

design for ordered aggregation

2022-09-06 Thread Yaron Gvili

Hi All, I'm working on a design for ordered aggregations in Arrow C++ and would like to get some opinions about it. Ordered aggregation is similar to grouped aggregation except that one column in the grouping key is (known to be) ordered. The result of both types of aggregations is the same but

Re: Integration between Flight and Acero

2022-09-06 Thread Yaron Gvili

Hi Li, Here's my 2 cents about the Ibis/Substrait part of this. An Ibis expression carries a schema. If you're planning to create an integrated Ibis/Substrait/Arrow solution, then you'll need the schema to be available to Ibis in Python. So, you'll need a Python wrapper for the C++ implementati

Re: [ANNOUNCE] New Arrow PMC member: Weston Pace

2022-09-05 Thread Yaron Gvili

Congratulations Weston! From: Raul Cumplido Dominguez Sent: Monday, September 5, 2022 10:04 AM To: dev@arrow.apache.org Subject: Re: [ANNOUNCE] New Arrow PMC member: Weston Pace Congratulations Weston! On Mon, Sep 5, 2022 at 3:37 PM Niranda Perera wrote: > Con

Re: [C++] Read Flight data source into Acero

2022-08-18 Thread Yaron Gvili

I have code in source_node.cc in a local branch adding factories for other sources in SourceNode (e.g., streams of RecordBatch, ExecBatch, or ArrayVector) which I could make a PR for, if there is interest. Yaron. From: David Li Sent: Wednesday, August 17, 2022

Re: dealing with tester timeout in a CI job

2022-08-17 Thread Yaron Gvili

ially investigate some kind of nightly (crossbow) test with a longer timeout but I don't know that we've had to resort to that yet. On Wed, Aug 17, 2022 at 3:41 AM Yaron Gvili wrote: > > It looks like the test normally takes less than a second. The gap in > running-time is not sur

Re: dealing with tester timeout in a CI job

2022-08-17 Thread Yaron Gvili

s are. Yaron. From: Li Jin Sent: Wednesday, August 17, 2022 9:04 AM To: dev@arrow.apache.org Subject: Re: dealing with tester timeout in a CI job Yaron, how does the asof join tests normally take? On Wed, Aug 17, 2022 at 6:13 AM Yaron Gvili wrote: > Sorry, yes,

Re: dealing with tester timeout in a CI job

2022-08-17 Thread Yaron Gvili

ld you show the URL of the failed macOS related CI job? Thanks, -- kou In "dealing with tester timeout in a CI job" on Tue, 16 Aug 2022 16:34:24 +, Yaron Gvili wrote: > Hi, > > What are some acceptable ways to handle a timeout failure in a CI job for a > tester I i

dealing with tester timeout in a CI job

2022-08-16 Thread Yaron Gvili

Hi, What are some acceptable ways to handle a timeout failure in a CI job for a tester I implemented? For reference, I got such a timeout for only one MacOS related CI job, while the other CI jobs did not get such a timeout. Let's assume that I cannot (easily) make the tests run any faster. Is

Re: [C++] Clarifying the behavior of source node and executor

2022-07-25 Thread Yaron Gvili

Perhaps we can look into adding an option to Source node to ensure "sequential".. Li On Mon, Jul 25, 2022 at 11:18 AM Yaron Gvili wrote: > I've also been using source node with a generator, but observed batches in > random order (in a 1-to-2-months old version of Arrow). So

Re: [C++] Clarifying the behavior of source node and executor

2022-07-25 Thread Yaron Gvili

I've also been using source node with a generator, but observed batches in random order (in a 1-to-2-months old version of Arrow). So, I'd be surprised if ordering is guaranteed, and I'm also interested in how to obtain such a guarantee. Yaron. From: Li Jin Se

Re: [C++] Moving from -O3 to -O2 optimization level in release builds

2022-07-21 Thread Yaron Gvili

> only enable -O3 on source files selectively that can be demonstrated to > benefit from it Unfortunately, actual benefits from -O3 are application dependent. As https://www.linuxjournal.com/article/7269 explains: "Although -O3 can produce fast code, the increase in the size of the image can h

Re: ExecutionContext, batch ordering clarification

2022-07-19 Thread Yaron Gvili

Hi, I also have a related question: could you recommend a way to get the batches in order when using a source node? If necessary, a way that involves changing or wrapping the source node's code is acceptable. Yaron. From: Li Jin Sent: Tuesday, July 19, 2022 10

Re: [C++] Question about substrait dependency in C++

2022-07-18 Thread Yaron Gvili

++] Question about substrait dependency in C++ Thanks both! Let me try changing ARROW_SUBSTRAIT_URL. Should I set ARROW_SUBSTRAIT_URL just to local substrait tarball or sth else? On Mon, Jul 18, 2022 at 2:28 PM Yaron Gvili wrote: > Hi Li, > > I was just writing this. > > AFAIK, curren

Re: [C++] Question about substrait dependency in C++

2022-07-18 Thread Yaron Gvili

Hi Li, I was just writing this. AFAIK, currently the Arrow C++ build system does not take prebuilt Substrait C++ classes. The usual way is rebuilding Arrow C++ with a custom Substrait repository, which is done by setting ARROW_SUBSTRAIT_URL to a local Substrait repository. You can download thi

Re: cpp: Debugging 'plan destruction before finishing'

2022-07-15 Thread Yaron Gvili

I ran into similar issues where a bug in a node's code led to an error that caused difficult-to-debug hangs or crashes during execution. I think a common problem with diagnosing such issues is that error messages (within Status instances) during execution do not always get communicated. Perhaps

Re: accessing Substrait protobuf Python classes from PyArrow

2022-07-07 Thread Yaron Gvili

ursday, July 7, 2022 5:15 AM To: dev@arrow.apache.org Subject: Re: accessing Substrait protobuf Python classes from PyArrow Hi Yaron, Le 07/07/2022 à 10:48, Yaron Gvili a écrit : > It looks like the main decision to make is whether accessing Substrait > protobuf Python classes from

Re: accessing Substrait protobuf Python classes from PyArrow

2022-07-07 Thread Yaron Gvili

v3+ > license. Is this license acceptable for a build dependency? TMK, that should be fine for a build dependency. [1] https://github.com/apache/arrow/pull/13500 On Wed, Jul 6, 2022 at 5:07 AM Yaron Gvili wrote: > > Regarding Rope that I mentioned earlier in this thread, it has an LGPL

Re: accessing Substrait protobuf Python classes from PyArrow

2022-07-06 Thread Yaron Gvili

Regarding Rope that I mentioned earlier in this thread, it has an LGPL v3+ license. Is this license acceptable for a build dependency? Yaron. From: Yaron Gvili Sent: Wednesday, July 6, 2022 7:26 AM To: dev@arrow.apache.org Subject: Re: accessing Substrait

Re: accessing Substrait protobuf Python classes from PyArrow

2022-07-06 Thread Yaron Gvili

m using Python code > than using Cython or C++. I'm not quite certain why this requires a modification to the plan. On Tue, Jul 5, 2022 at 7:45 AM Yaron Gvili wrote: > > @Li, yes though in a new way. This came up in a data-source UDF scenario > where the implementation is a P

Re: accessing Substrait protobuf Python classes from PyArrow

2022-07-05 Thread Yaron Gvili

Mon, Jul 4, 2022 at 1:24 PM Yaron Gvili wrote: > This rewriting of the package is basically what I had in mind; the `_ep` > was just to signal a private package, which cannot be enforced, of course. > Assuming this rewriting would indeed avoid conflict with any standard > protobuf pa

Re: accessing Substrait protobuf Python classes from PyArrow

2022-07-04 Thread Yaron Gvili

f their system library. It looks like pyarrow currently only depends on numpy, which is pretty awesome... so I feel like we should keep it that way. Not sure what the best course of action is. Jeroen On Sun, 3 Jul 2022 at 22:55, Yaron Gvili wrote: > Thanks, the Google protobuf exposure concerns

Re: accessing Substrait protobuf Python classes from PyArrow

2022-07-03 Thread Yaron Gvili

gt; semi-regularly pushes breaking changes and Arrow currently lags behind by > several months (though I have a PR open for Substrait 0.6). I guess from > that point of view distributing the right version along with pyarrow seems > nice, but the issues of Google's protobuf implementation r

Re: accessing Substrait protobuf Python classes from PyArrow

2022-07-03 Thread Yaron Gvili

und like a reasonable approach? Yaron. ____ From: Yaron Gvili Sent: Saturday, July 2, 2022 8:55 AM To: dev@arrow.apache.org ; Phillip Cloud Subject: Re: accessing Substrait protobuf Python classes from PyArrow I'm somewhat confused by this answer because I think resolv

Re: accessing Substrait protobuf Python classes from PyArrow

2022-07-02 Thread Yaron Gvili

e I would rather prefer that the user application import both pyarrow and substrait-python independently. Perhaps @Phillip Cloud or someone from the Ibis space might have some ideas on where this might be found. -Weston On Thu, Jun 30, 2022 at 10:06 AM Yaron Gvili wrote: > > Hi, > > I

accessing Substrait protobuf Python classes from PyArrow

2022-06-30 Thread Yaron Gvili

Hi, Is there support for accessing Substrait protobuf Python classes (such as Plan) from PyArrow? If not, how should such support be added? For example, should the PyArrow build system pull in the Substrait repo as an external project and build its protobuf Python classes, in a manner similar t

Re: user-defined Python-based data-sources in Arrow

2022-06-24 Thread Yaron Gvili

be called reentrantly? In other words, can we call the function before the previous call finishes if we want to read the source in parallel? [1] https://github.com/apache/arrow/blob/86915807af6fe10f44bc881e57b2f425f97c56c7/cpp/src/arrow/record_batch.h#L219 On Wed, Jun 22, 2022 at 9:47 AM Yaron G

Re: user-defined Python-based data-sources in Arrow

2022-06-22 Thread Yaron Gvili

discussion here? On Wed, Jun 22, 2022 at 11:40 AM Yaron Gvili wrote: > Hi, > > I'd like to get the community's feedback about a design proposal > (discussed below) for integrating user-defined Python-based data-sources in > Arrow. This is part of a larger project I'm

user-defined Python-based data-sources in Arrow

2022-06-22 Thread Yaron Gvili

Hi, I'd like to get the community's feedback about a design proposal (discussed below) for integrating user-defined Python-based data-sources in Arrow. This is part of a larger project I'm working on to provide end-to-end (Ibis/Ibis-Substrait/Arrow) support for such data-sources. A user-define

problem building Arrow under pyarrow-dev in debug-mode

2022-06-07 Thread Yaron Gvili

Hi, I tried following the instruction in Python development page and ran into a problem building Arrow under pyarrow-dev in debug-mode. What am I doing wrong? For the release-mode, which does build and run OK, I use the following commands: $ cmake -GNinja -DCMAKE_INSTALL_PREFIX=$ARROW_HOME -DCM

Re: arithmetic manipulation of PyArrow numeric arrays

2022-06-07 Thread Yaron Gvili

2:54 PM Yaron Gvili wrote: > Hi, > > This is likely a question (or two) with a simple answer that I couldn't > easily find. While working with PyArrow UDFs, I tried implementing a simple > UDF (see first function below) and noticed that it failed upon receiving a > pyarro

arithmetic manipulation of PyArrow numeric arrays

2022-06-06 Thread Yaron Gvili

Hi, This is likely a question (or two) with a simple answer that I couldn't easily find. While working with PyArrow UDFs, I tried implementing a simple UDF (see first function below) and noticed that it failed upon receiving a pyarrow.lib.DoubleArray which cannot be directly manipulated with ar

Re: data-source UDFs

2022-06-06 Thread Yaron Gvili

I find it to be more difficult for users to follow than just registering a type with a defined interface. 4. Is there a particular reason in your use case for using the function registry for this? 5. Do you imagine these UDFs would always be specific to particular users? Or would it be possible f

Re: data-source UDFs

2022-06-04 Thread Yaron Gvili

Thanks for the detailed overview, Weston. I agree with David this would be very useful to have in a public doc. Weston and David's discussion is a good one, however, I see it as separate from the discussion I brought up. The former is about facilities (like extension points) for implementing cu

data-source UDFs

2022-06-03 Thread Yaron Gvili

Hi, I'm working on support for data-source UDFs and would like to get feedback about the design I have in mind for it. By support for data-source UDFs, at a basic level, I mean enabling a user to define using PyArrow APIs a record-batch-generating function implemented in Python that would be e

Re: design for Python UDFs in an Ibis/Substrait/Arrow workflow

2022-05-25 Thread Yaron Gvili

(3) > > Deserialize the Substrait relation/expression in Arrow compute and > execute > > the UDF (either using the approach in the current Scalar UDF prototype or > > do sth else) > > (Same as the Yaron layout above). > > > > Now I think we have reasonable solutio

design for Python UDFs in an Ibis/Substrait/Arrow workflow

2022-05-15 Thread Yaron Gvili

Hi, I'm working on a Python UDFs PoC and would like to get the community's feedback on its design. The goal of this PoC is to enable a user to integrate Python UDFs in an Ibis/Substrait/Arrow workflow. The basic idea is that the user would create an Ibis expression that includes Python UDFs im

Re: PyArrow builds but fails to load pyarrow._dataset

2022-05-10 Thread Yaron Gvili

t Yaron. ____ From: Yaron Gvili Sent: Tuesday, May 10, 2022 1:24 PM To: dev@arrow.apache.org Subject: Re: PyArrow builds but fails to load pyarrow._dataset > Does `import pyarrow` work? Yes. Also, all but one unit te

Re: PyArrow builds but fails to load pyarrow._dataset

2022-05-10 Thread Yaron Gvili

t;> >> export PYARROW_WITH_DATASET=1 >> >> On Tue, May 10, 2022 at 7:07 AM Yaron Gvili wrote: >>> >>> Hello, >>> >>> I ran into a problem with running PyArrow that I locally built. The build >>> worked fine (or so it seems)

PyArrow builds but fails to load pyarrow._dataset

2022-05-10 Thread Yaron Gvili

Hello, I ran into a problem with running PyArrow that I locally built. The build worked fine (or so it seems) but then the testing procedure had a failure due to not being able to load pyarrow._dataset, which I manually confirmed. I'd appreciate any guidance on how to fix this error. Below are

Re: ExecBatch in arrow execution engine

2022-05-09 Thread Yaron Gvili

Hi Yue, >From my limited experience with the execution engine, my understanding is that >the API allows streaming only an ExecBatch from one node to another. A >possible solution is to derive from ExecBatch your own class (say) >RichExecBatch that carries any extra metadata you want. If in your

Re: RFC: Out of Process Python UDFs in Arrow Compute

2022-05-06 Thread Yaron Gvili

The general design seems reasonable to me. However, I think the multithreading issue warrants a (perhaps separate) discussion, in view of the risk that Arrow's multithreading model would end up being hard to interoperate with that of other libraries used to implement UDFs. Such interoperability

Re: [C++] output field names in Arrow Substrait

2022-04-20 Thread Yaron Gvili

nk the > size > >>> > explosion Phillip is talking about would be avoided. I *really* don't > >>> > see why anyone would want to match *generated* names in any > functional > >>> > way; that's a recipe for undefined behavior. >

Re: [C++] output field names in Arrow Substrait

2022-04-20 Thread Yaron Gvili

de that operates on the input's columns whose name starts > with the given string-name or a node that operates on an input > column whose name is given as data in another input column. In both of those cases the field names are not part of the plan itself. On Tue, Apr 19, 2022 at 9:16

Re: [C++] output field names in Arrow Substrait

2022-04-19 Thread Yaron Gvili

tainly a potential goal, and PRs to add that capability would be welcome, but I don't know if anyone working on the Arrow/Substrait integration has that goal in mind. If that is your goal I might be curious to learn more about your use cases. On Tue, Apr 19, 2022 at 6:11 AM Yaron Gvili wrote:

[C++] output field names in Arrow Substrait

2022-04-19 Thread Yaron Gvili

Hi, We ran into an issue due to the fact that, for intermediate relations, Substrait does not automatically compute output field names nor allows one to explicitly name output fields [1]. This leads to trouble when one needs to refer to these output fields by name [2]. We run into this trouble

Re: [C++] Build/Link against master / custom branch

2022-02-04 Thread Yaron Gvili

Hello, On Ubuntu, I managed to get a local external (i.e., to Arrow) project to build against a locally and custom-built Arrow project working using the following: * Locally build Arrow and install it to a directory $ARROW_ROOT_DIR * Configure the external project build using: cmake -D

67 matches

Mail list logo