Re: Question about mutability

2016-02-24 Thread Leif Walsh
Here's the image that popped into my mind when I heard about this project, at best it's a motivating example, at worst it's a distraction: 1. Spark reads parquet from wherever into an arrow structure in shared memory. 2. Spark executor calls into the Python half of pyspark with a handle to this me

Re: SIMD support in Java

2016-02-24 Thread Leif Walsh
The JVM may be able to do popcount optimization but it's categorically bad at other vectorization instructions. On Wed, Feb 24, 2016 at 18:30 Taro L. Saito wrote: > Thanks for letting me know. > > If we need to embed C++ binaries (.so files) inside java, > snappy-java's approach https://github.co

Re: Getting started guide

2016-02-26 Thread Leif Walsh
Arrow doesn't seem to be ready for use yet. I think it's an aspirational project. I'd watch for announcements soon but I wouldn't try to incorporate today. On Fri, Feb 26, 2016 at 2:10 PM Slava B wrote: > Agree, also looking for such tutorial > > On Fri, Feb 26, 2016 at 11:05 AM, Vishnu Viswan

Re: Should Nullable be a nested type?

2016-02-26 Thread Leif Walsh
In the abstract (since I haven't written any code), let me see if I can make an argument for considering "nullable int" and "int" to both be worthwhile "primitive" types, as opposed to "Nullable" being a constructed type over the primitive type "int", in the C++ arena. Let's assume Arrow's use cas

Re: Should Nullable be a nested type?

2016-02-26 Thread Leif Walsh
I meant "We probably don't want std::vector>" On Fri, Feb 26, 2016 at 10:50 PM Leif Walsh wrote: > In the abstract (since I haven't written any code), let me see if I can > make an argument for considering "nullable int" and "int" to both

Re: Understanding "shared" memory implications

2016-03-19 Thread Leif Walsh
Seems to me IPC/LPC/RPC focuses on the wrong distinction. I think the right one is between async message-passing (over a socket), where the receiver decides when to handle the message, and synchronous/direct memory manipulation (shared mmap, rdma), where the "client" manipulates the "server's" (rat

Re: [C++] How careful do we want to be about exceptions?

2016-06-07 Thread Leif Walsh
I agree, as a C++ library it is acceptable to let these exceptions bubble up to the client, and for the C bindings, all exceptions should be caught and translated to appropriate error codes. Most other languages that interface with it will probably use the C wrapper and will gain visibility into t

Re: [JAVA] Figuring out whats shifted from Drill/Java

2016-06-07 Thread Leif Walsh
I am also interested in this. On Tue, Jun 7, 2016 at 17:37 Holden Karau wrote: > Hi Everyone, > > I'm looking to help get started with Arrow & Spark and to that end I'd like > to start with getting the Java implementation closer to the spec / C > implementation. I'm wondering what places people k

Re: Documentation hosting

2017-01-03 Thread Leif Walsh
+1 this sounds pretty sane On Fri, Dec 30, 2016 at 06:02 Uwe L. Korn wrote: > I just had a look over the Apache Calcite approach and I like it very > much. Both, from a technical and the structural (i.e. keeping the > website in the main repo). This will enable us to have the format spec > on Git

Re: [DISCUSS] C++ code sharing amongst Apache {Arrow, Kudu, Impala, Parquet}

2017-02-26 Thread Leif Walsh
I also support the idea of creating an "apache commons modern c++" style library, maybe tailored toward the needs of columnar data processing tools. I think APR is the wrong project but I think that *style* of project is the right direction to aim. I agree this adds test and release process compl

Re: [DISCUSS] C++ code sharing amongst Apache {Arrow, Kudu, Impala, Parquet}

2017-02-27 Thread Leif Walsh
aused by patch in $COMMON > > * Arrow proposes patch to $COMMON > > * ... > > > > This is the worst case scenario, of course, but I actually think it is > > good because it would indicate that the unit testing in $COMMON needs > > to be improved. Unit testing in

Re: [DISCUSS] The road from Arrow 0.5.0 to 1.0.0

2017-07-27 Thread Leif Walsh
I think Wes' idea that major versions indicate stability of the spec and minor versions indicate stability of each implementation's API makes sense. With that in mind, maybe before 1.0 of the spec we should just establish, within each of the reference language implementations, a mechanism for speci

Tensor column types in arrow

2018-04-09 Thread Leif Walsh
Hi all, I’ve been doing some work lately with Spark’s ML interfaces, which include sparse and dense Vector and Matrix types, backed on the Scala side by Breeze. Using these interfaces, you can construct DataFrames whose column types are vectors and matrices, and though the API isn’t terribly rich,

Re: Tensor column types in arrow

2018-04-09 Thread Leif Walsh
Matrix are > not "first class" types in Spark SQL. Spark ML implements them as UDT > (user-defined types) so it's not clear how to make Spark/Arrow converter > work with them. > > I wonder if Bryan and Holden have some more thoughts on that? > > Li > > On M

Re: Tensor column types in arrow

2018-04-09 Thread Leif Walsh
class" types in Spark SQL. Spark ML implements them as UDT > > (user-defined types) so it's not clear how to make Spark/Arrow converter > > work with them. > > > > I wonder if Bryan and Holden have some more thoughts on that? > > > > Li > > > >

Re: Tensor column types in arrow

2018-04-10 Thread Leif Walsh
t > > of schema metadata or a required part of the schema itself? > > > > I feel having it be required might be too restrictive for interop with > > other systems. > > > > On Mon, Apr 9, 2018 at 9:13 PM, Leif Walsh wrote: > > > >> My gut feeling is t

[jira] [Commented] (ARROW-189) C++: Use ExternalProject to build thirdparty dependencies

2016-10-08 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15558447#comment-15558447 ] Leif Walsh commented on ARROW-189: -- Do you also want to remove the ability to use the

[jira] [Commented] (ARROW-189) C++: Use ExternalProject to build thirdparty dependencies

2016-10-08 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15558446#comment-15558446 ] Leif Walsh commented on ARROW-189: -- I can take this, I have it working, just nee

[jira] [Commented] (ARROW-189) C++: Use ExternalProject to build thirdparty dependencies

2016-10-08 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15558671#comment-15558671 ] Leif Walsh commented on ARROW-189: -- What is "the right thing"? How can I

[jira] [Commented] (ARROW-189) C++: Use ExternalProject to build thirdparty dependencies

2016-10-08 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15558688#comment-15558688 ] Leif Walsh commented on ARROW-189: -- Right, so I'd delete that if I didn'

[jira] [Commented] (ARROW-189) C++: Use ExternalProject to build thirdparty dependencies

2016-10-08 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15558695#comment-15558695 ] Leif Walsh commented on ARROW-189: -- Oh, I see, you still want downstream packagers t

[jira] [Commented] (ARROW-112) [C++] Style fix for constants/enums

2016-10-08 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15558850#comment-15558850 ] Leif Walsh commented on ARROW-112: -- I've done most of this I think, did you als

[jira] [Commented] (ARROW-112) [C++] Style fix for constants/enums

2016-10-08 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15558882#comment-15558882 ] Leif Walsh commented on ARROW-112: -- Ok, cool. Glad I put them in separate commits

[jira] [Commented] (ARROW-112) [C++] Style fix for constants/enums

2016-10-10 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563959#comment-15563959 ] Leif Walsh commented on ARROW-112: -- https://github.com/apache/arrow/pull/168

[jira] [Commented] (ARROW-189) C++: Use ExternalProject to build thirdparty dependencies

2016-10-10 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563958#comment-15563958 ] Leif Walsh commented on ARROW-189: -- https://github.com/apache/arrow/pull/167 > C

[jira] [Commented] (ARROW-317) [C++] Implement zero-copy Slice method on arrow::Buffer that retains reference to parent

2016-10-11 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15567513#comment-15567513 ] Leif Walsh commented on ARROW-317: -- How does this relate to ARROW-33? > [C++] Im

[jira] [Commented] (ARROW-379) Python: Use setuptools_scm/setuptools_scm_git_archive to provide the version number

2016-12-17 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757648#comment-15757648 ] Leif Walsh commented on ARROW-379: -- Did anyone add a conda-forge package

[jira] [Commented] (ARROW-379) Python: Use setuptools_scm/setuptools_scm_git_archive to provide the version number

2016-12-17 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15758348#comment-15758348 ] Leif Walsh commented on ARROW-379: -- Perfect, thanks. > Python: Use setupto

[jira] [Commented] (ARROW-379) Python: Use setuptools_scm/setuptools_scm_git_archive to provide the version number

2016-12-17 Thread Leif Walsh (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15758361#comment-15758361 ] Leif Walsh commented on ARROW-379: -- Hmm, maybe not perfect. With the pyarrow-feeds

[jira] [Created] (ARROW-805) listing empty HDFS directory returns an error instead of returning empty list

2017-04-11 Thread Leif Walsh (JIRA)
Leif Walsh created ARROW-805: Summary: listing empty HDFS directory returns an error instead of returning empty list Key: ARROW-805 URL: https://issues.apache.org/jira/browse/ARROW-805 Project: Apache

[jira] [Created] (ARROW-2403) [C++] arrow::CpuInfo::model_name_ destructed twice on exit

2018-04-05 Thread Leif Walsh (JIRA)
Leif Walsh created ARROW-2403: - Summary: [C++] arrow::CpuInfo::model_name_ destructed twice on exit Key: ARROW-2403 URL: https://issues.apache.org/jira/browse/ARROW-2403 Project: Apache Arrow