Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?

2021-09-07 Thread QP Hou
A minor note on the Rust side of things. arrow-rs has a 2 weeks release cycle, but arrow-datafusion mostly does release on demand at the moment. Our most uptodate release processes are documented at [1] and [2]. [1]: https://github.com/apache/arrow-rs/blob/master/dev/release/README.md [2]: https:

Re: [Question] Allocations along 64 byte cache lines

2021-09-07 Thread Yibo Cai
Thanks Jorge, I'm wondering if the 64 bytes alignment requirement is for cache or for simd register(avx512?). For simd, looks register width alignment does helps. E.g., _mm_load_si128 can only load 128 bits aligned data, it performs better than _mm_loadu_si128, which supports unaligned load.

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-09-07 Thread Jacques Nadeau
As Phillip mentioned, I think there is something powerful in producing a standard serialized representation of compute operations beyond just Arrow and I'd really like to create a broader community around it. This has been something I had been independently thinking about for the last several month

Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?

2021-09-07 Thread Jacob Quinn
Thanks kou. I think the TODO action list looks good. The one point I think could use some additional discussion is around the release cadence: it IS desirable to be able to release more frequently than the parent repo 3-4 month cadence. But we also haven't had the frequency of commits to necessar

Re: HDFS ORC to Arrow Dataset

2021-09-07 Thread Weston Pace
I'll just add that a PR in in progress (thanks Joris!) for adding this adapter: https://github.com/apache/arrow/pull/10991 On Tue, Sep 7, 2021 at 12:05 PM Wes McKinney wrote: > > I'm missing context but if you're talking about C++/Python, we are > currently missing a wrapper interface to the ORC

Re: HDFS ORC to Arrow Dataset

2021-09-07 Thread Wes McKinney
I'm missing context but if you're talking about C++/Python, we are currently missing a wrapper interface to the ORC reader in the Arrow datasets library https://github.com/apache/arrow/tree/master/cpp/src/arrow/dataset We have CSV, Arrow (IPC), and Parquet interfaces. But we have an HDFS filesys

Re: HTTP traffic of Arrow Flight

2021-09-07 Thread Mohamed Abdelhakem
Yes, I got it, I have to do decode as and choose HTTP2 protocol Thanks a lot On 2021/09/07 17:06:10, "David Li" wrote: > Yes and to be extra clear, Flight currently only supports gRPC, and hence > HTTP/2 (barring a few hypothetical configurations), it may also be that you > need to explicitly

Re: HTTP traffic of Arrow Flight

2021-09-07 Thread Mohamed Abdelhakem
I am using Java Flight Client using Arrow Flight gRPC version 5.0 On 2021/09/07 17:03:42, Nate Bauernfeind wrote: > HTTP (and HTTP/2) traffic is sent over TCP. You might need to be more > specific, or possibly do some more research on your end > > Which arrow flight client are you using in you

Re: HTTP traffic of Arrow Flight

2021-09-07 Thread David Li
Yes and to be extra clear, Flight currently only supports gRPC, and hence HTTP/2 (barring a few hypothetical configurations), it may also be that you need to explicitly tell WireShark the protocol in use. -David On Tue, Sep 7, 2021, at 13:03, Nate Bauernfeind wrote: > HTTP (and HTTP/2) traffic

Re: HTTP traffic of Arrow Flight

2021-09-07 Thread Nate Bauernfeind
HTTP (and HTTP/2) traffic is sent over TCP. You might need to be more specific, or possibly do some more research on your end Which arrow flight client are you using in your test? Java? C++? Which version? Can you provide a simple gRPC server/client example that shows up in WireShark as you expec

HTTP traffic of Arrow Flight

2021-09-07 Thread Mohamed Abdelhakem
When I built a simple FlightServer and FlightClient, I noticed that the traffic captured by WireShark is TCP, not HTTP/2 MY question is how to configure Arrow Flight to use HTTP/2 protocol traffic

Fwd: HDFS ORC to Arrow Dataset

2021-09-07 Thread Manoj Kumar
Hi Dev-Community, Anyone can help me to guide how to read ORC directly from HDFS to an arrow dataset. Thanks Manoj

Re: [Question] Allocations along 64 byte cache lines

2021-09-07 Thread Jorge Cardoso Leitão
Thanks, I think that the alignment requirement in IPC is different from this one: we enforce 8/64 byte alignment when serializing for IPC, but we (only) recommend 64 byte alignment in memory addresses (at least this is my understanding from the above link). I did test adding two arrays and the re