Re: Arrow in HPC

2021-11-02 Thread Jed Brown
"David Li" writes: > Thanks for the clarification Yibo, looking forward to the results. Even if it > is a very hacky PoC it will be interesting to see how it affects performance, > though as Keith points out there are benefits in general to UCX (or similar > library), and we can work out the

Re: [Question] Allocations along 64 byte cache lines

2021-09-09 Thread Jed Brown
Jorge Cardoso Leitão writes: > Yes, I expect aligned SIMD loads to be faster. > > My understanding is that we do not need an alignment requirement for this, > though: split the buffer in 3, [unaligned][aligned][unaligned], use aligned > loads for the middle and un-aligned (or not even SIMD) for

Re: Arrow in HPC

2021-09-09 Thread Jed Brown
Yibo Cai writes: > HPC infrastructure normally leverages RDMA for fast data transfer among > storage nodes and compute nodes. Computation tasks are dispatched to > compute nodes with best fit resources. > > Concretely, we are investigating porting UCX as Flight transport layer. > UCX is a

Re: [Rust] [Discuss] proposal to redesign Arrow crate to resolve safety violations

2021-05-27 Thread Jed Brown
Andy Grove writes: > > Looking at this purely from the DataFusion/Ballista point of view, what I > would be interested in would be having a branch of DF that uses arrow2 and > once that branch has all tests passing and can run queries with performance > that is at least as good as the original

Re: [ANNOUNCE] Copying Rust components to new repositories

2021-04-18 Thread Jed Brown
Andy Grove writes: > We started looking at the documentation for git filter-branch and it > recommends not to use it. It states that "git-filter-branch is riddled with > gotchas resulting in various ways to easily corrupt repos or end up with a > mess worse than what you started with:". I've

Re: CI feedback time

2021-04-15 Thread Jed Brown
Wes McKinney writes: > I think we should take a more serious look at Buildkite for some of our CI. > > * First of all, it's very easy to connect self-hosted workers and > supports ephemeral cloud workers in a way that would be difficult or > impossible with GHA. No need to have Infra fiddle with

Re: [DISCUSS] How to describe computation on Arrow data?

2021-03-18 Thread Jed Brown
I'm interested in providing some path to make this extensible. To pick an example, suppose the user wants to compute the first k principle components. We've talked [1] about the possibility of incorporating richer communication semantics in Ballista (a la MPI sub-communicators) and numerical

RE: [C++][Discuss] Approaches for SIMD optimizations

2020-06-10 Thread Jed Brown
"Du, Frank" writes: > The PR I committed provide a basic support for runtime dispatching. I > agree that complier should generate good vectorize for the non-null > data part but in fact it didn't, jedbrown point to it can force > complier to SIMD using some additional pragmas, something like >

Re: [DISCUSS] C-level in-process array protocol

2019-10-01 Thread Jed Brown
I'd just like to chime in with the use case of in-situ data analysis for simulations. This domain tends to be cautious with dependencies and there is a lot of C and Fortran, but the in-situ analysis tools will preferably reside in separate processes while sharing memory via shared memory

Re: [C++] Private implementations and virtual interfaces

2019-07-27 Thread Jed Brown
Wes McKinney writes: > The abstract/all-virtual base has some benefits: > > * No need to implement "forwarding" methods to the private implementation > * Do not have to declare "friend" classes in the header for some cases > where other classes need to access the methods of a private >

Re: [DISCUSS] Ongoing Travis CI service degradation

2019-06-29 Thread Jed Brown
Sutou Kouhei writes: > How about creating a mirror repository on > https://gitlab.com/ only to run CI jobs? > > This is an idea that is described in > https://issues.apache.org/jira/browse/ARROW-5673 . > > GitLab CI can attach external workers. So we can increase CI > capacity by adding our new

RE: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-05-06 Thread Jed Brown
"Malakhov, Anton" writes: > Jed, > >> From: Jed Brown [mailto:j...@jedbrown.org] >> Sent: Friday, May 3, 2019 12:41 > >> You linked to a NumPy discussion >> (https://github.com/numpy/numpy/issues/11826) that is encountering the same >> is

RE: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-05-03 Thread Jed Brown
"Malakhov, Anton" writes: >> > the library creates threads internally. It's a disaster for managing >> > oversubscription and affinity issues among groups of threads and/or >> > multiple processes (e.g., MPI). > > This is exactly what I'm talking about referring as issues with threading >

Re: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-05-03 Thread Jed Brown
Antoine Pitrou writes: > Hi Jed, > > Le 03/05/2019 à 05:47, Jed Brown a écrit : >> I would caution to please not commit to the MKL/BLAS model in which the >> library creates threads internally. It's a disaster for managing >> oversubscription and affinity iss

RE: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-05-02 Thread Jed Brown
I would caution to please not commit to the MKL/BLAS model in which the library creates threads internally. It's a disaster for managing oversubscription and affinity issues among groups of threads and/or multiple processes (e.g., MPI). For example, a composable OpenMP technique is for the

Re: Sparse matrix formats

2019-03-13 Thread Jed Brown
Kenta Murata writes: > Hi Jed, > > I'd like to describe the current status of the implementation of SparseTensor. > I hope the following explanation will help you. > > First of all, I designed the current SparseTensor format as the first > interim implementation. > At this time I used

Re: Sparse matrix formats

2019-03-11 Thread Jed Brown
relation to Arrow I don't understand (could be an explicit non-goal for all I know). Wes McKinney writes: > hi Jed, > > Would you like to submit a pull request to propose the changes or > additions you are escribing? > > Thanks > Wes > > On Sat, Mar 9, 2019 at 11:32 PM

Sparse matrix formats

2019-03-09 Thread Jed Brown
Wes asked me to bring this discussion here. I'm a developer of PETSc and, with Arrow is getting into the sparse representation space, would like for it to interoperate as well as possible. 1. Please guarantee support for 64-bit offsets and indices. The current spec uses "long", which is 32-bit