Re: [Python] Please help me on running unit tests

2020-11-13 Thread Bill Zhao
Please ignore my previous email. I have found the way. Thank you. Bill Zhao 于2020年11月13日周五 下午10:20写道: > Hi, > > > I am testing a change to the python parquet module on an updated Mac OS. > I have followed the instructions on >

[Python] Please help me on running unit tests

2020-11-13 Thread Bill Zhao
Hi, I am testing a change to the python parquet module on an updated Mac OS. I have followed the instructions on https://arrow.apache.org/docs/developers/python.html#building-on-linux-and-macos and created pyarrow-dev conda env. I also have run "brew update && brew bundle

Re: [DISCUSS] Extend specification with the definition of equality?

2020-11-13 Thread Wes McKinney
I don't think that rules about "semantic" equality (i.e. two values being semantically "equal" -- like two different NaN bit patterns -- even though the memory is different) belong in the specification documents. On Fri, Nov 13, 2020 at 12:19 PM Jorge Cardoso Leitão wrote: > > Hi Wes, > > Could

Re: [DISCUSS] Extend specification with the definition of equality?

2020-11-13 Thread Jorge Cardoso Leitão
Hi Wes, Could you clarify? The logical data type you mean arrow's logical data type? The semantics of the logical data type are the only ones that could IMO justify a clarification, in particular, given a data type, how do we agree that slot i from array "a" and slot j from array "b" are equal.

Re: Patch Release 2.0.1?

2020-11-13 Thread Joris Van den Bossche
There are a few small pyarrow regressions related to fsspec filesystems (breaking some dask usage), so I would also be very much in favor of having a 2.0.1 patch release. All relevant JIRAs should be tagged with a 2.0.1 version. The one other issue that I remember but which hasn't this 2.0.1

Re: [DISCUSS] Extend specification with the definition of equality?

2020-11-13 Thread Wes McKinney
On Fri, Nov 13, 2020 at 1:19 AM Micah Kornfield wrote: > > Hi Jorge, > I think it would make sense to add some clarifications to the document per > Wes's comments. Do you want to maybe try to make a PR? > > One small edge case to consider is how NaN float values are compared. I think at the

Re: [Rust]: Architecture support

2020-11-13 Thread vertexclique vertexclique
> > In terms of what to support, I would personally recommend waiting for > someone who wants to use Rust / arrow on a specific platform (and thus has > the time to help us test). Without user input I don't have any sense of > which platforms you list below are the most important I want to use

Re: [DataFusion] Blocking async of async is not async

2020-11-13 Thread vertexclique vertexclique
About the questions; - Does anybody know a way to make such a setup work? I have written thin wrapper using async_trait around the parquet reader in my current project, instead of I am writing it, if we have exposed that from parquet and push chunks via channels it would work at the consumer

Re: [DataFusion] Blocking async of async is not async

2020-11-13 Thread vertexclique vertexclique
Hi, Just a small contribution. We have made some adjustments at Signavio with Jörn to make the chunk reading from S3 streaming at least. You can find the code here. Mind that for this to make a bigger effect increase the hyper's buffer size in its' configuration (which is not included in this

Re: [DataFusion] Blocking async of async is not async

2020-11-13 Thread Andrew Lamb
My understanding of tokio is that there is exactly one global Runtime which has two thread pools: one for synchronous tasks and one for async tasks I am fairly sure there can be only one global Runtime (because when I tried try to

Re: [DISCUSS] Alternative design for KMS interaction in parquet-cpp

2020-11-13 Thread Tham Ha
Hi Ben, Can you reconsider this point: > Properties (including FileDecryptionProperties) are immutable after construction and thus can be safely shared between threads with no further synchronization. As I see, the returned FileDecryptionProperties object from PropertiesDrivenCryptoFactory is

Re: Pandas Block Manager

2020-11-13 Thread Joris Van den Bossche
As Micah and Wes pointed out on the PR, this alignment/padding are requirements of the format specification. For reference, see here: https://arrow.apache.org/docs/format/Columnar.html#buffer-alignment-and-padding That's also the reason that I said earlier in this thread that such zero-copy

Re: [DataFusion] Blocking async of async is not async

2020-11-13 Thread Rémi Dettai
Hi Andrew! Thanks for your quick response and sorry it took me so long to answer back. `spawn_blocking` solves the issue: https://gist.github.com/rdettai/d2f9bc59b31785c35dce792878976a19 I am still worried by the amount of thread pools and complexity it creates (1 pool for the outer runtime, 1

[NIGHTLY] Arrow Build Report for Job nightly-2020-11-13-0

2020-11-13 Thread Crossbow
Arrow Build Report for Job nightly-2020-11-13-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-11-13-0 Failed Tasks: - conda-win-vs2017-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-11-13-0-azure-conda-win-vs2017-py36 -

Re: [DISCUSS] Alternative design for KMS interaction in parquet-cpp

2020-11-13 Thread Gidon Gershinsky
Hi all, Glad to see the parquet-cpp progress on this! Can I suggest creating a googledoc for the technical discussion? The current md doc format seems to be harder for pinpointed comments. I got a few, but they are too minor for sending to the two mailing lists. Cheers, Gidon On Fri, Nov 13,