Apache Arrow at JupyterCon

2017-08-26 Thread Wes McKinney
hi all, In case folks here are interested, I gave a keynote this week at JupyterCon explaining my motivations for being involved in Apache Arrow and how I see it fitting in with the data science ecosystem long term: https://www.youtube.com/watch?v=wdmf1msbtVs I also gave an interview going a lit

Re: Apache Arrow at JupyterCon

2017-08-30 Thread Julian Hyde
Thanks for sharing. Can we tweet those videos as well? I see that https://twitter.com/apachearrow only tweeted your slides. > On Aug 26, 2017, at 1:11 PM, Wes McKinney wrote: > > hi all, > > In case folks here are interested, I gave a keynote this week at > J

Re: Apache Arrow at JupyterCon

2017-08-30 Thread Wes McKinney
Absolutely. I will do that now On Wed, Aug 30, 2017 at 3:33 PM, Julian Hyde wrote: > Thanks for sharing. Can we tweet those videos as well? I see that > https://twitter.com/apachearrow only > tweeted your slides. > >> On Aug 26, 2017, at 1:11 PM, Wes McKinney

Re: Apache Arrow at JupyterCon

2017-08-30 Thread Ivan Sadikov
Great presentation! Thank you for sharing. On Thu, 31 Aug 2017 at 8:02 AM, Wes McKinney wrote: > Absolutely. I will do that now > > On Wed, Aug 30, 2017 at 3:33 PM, Julian Hyde wrote: > > Thanks for sharing. Can we tweet those videos as well? I see that > https://twitter.com/apachearrow

Re: Apache Arrow at JupyterCon

2017-08-30 Thread Gang(Gary) Wang
Thank you for sharing the videos. We are very interested in how to support Arrow data format and collection very closely, could you please help to point out which interfaces to allow Mnemonic act as a memory provider for the user to store and access Arrow managed datasets ? Thanks! Very truly your

Re: Apache Arrow at JupyterCon

2017-08-31 Thread Wes McKinney
hi Gary, The Java libraries are not yet capable of writing or zero-copy reads of Arrow datasets to/from shared memory or memory-mapped files: https://issues.apache.org/jira/browse/ARROW-721. We've developed quite a bit of technology on the C++ side for dealing with shared memory IPC but we need so

Re: Apache Arrow at JupyterCon

2017-08-31 Thread Gang(Gary) Wang
Hi Wes, Thank you for the explanation. the usage of https://issues.apache.org/jira/browse/ARROW-721 could be directly supported by Mnemonic through DurableBuffer and DurableChunk, the DurableChunk makes use of unsafe to expose a plain memory space for Arrow to use without performance penalties. th

Re: Apache Arrow at JupyterCon

2017-09-03 Thread Wes McKinney
I think ideally we would have a Java interface that would support all of: - Memory mapped files - Anonymous shared memory segments (e.g. POSIX shm) - NVM / Mnemonic We already have the ability to do zero-copy reads from buffer-like objects in C++ and IO interfaces that support zero copy (like mem

Re: Apache Arrow at JupyterCon

2017-09-04 Thread Gonzalo Ortiz Jaureguizar
This is a very interesting feature. It's very surprising that there is no ByteBuffer implementation backed on a MappedByteBuffer. As far as I understand, it should be trivial to implement (maybe not to pool) as usually ByteBuf is backed on a ByteBuffer and MappedByteBuffer extends that. But I didn'

Re: Apache Arrow at JupyterCon

2017-09-06 Thread Wes McKinney
It should be possible to have an ArrowBuf backed by a MappedByteBuffer. Anyone reading is welcome to dig in and write a patch for this. Semantically this is what we have done in C++ -- a memory map inherits from arrow::Buffer, so we can slice and dice a memory map as we would any other Buffer obje

Re: Apache Arrow at JupyterCon

2017-09-06 Thread Gary Wong
The ArrowBuf is inherited from AbstractByteBuf, the AbstractByteBuf is defined in the Netty library, it does more like a memory pool not a pure buffer so that's why ArrowBuf is not backed by ByteBuffer as now. I have ever tried to make ArrowBuf build on top of DurableBuffer of Mnemonic, but looks

Re: Apache Arrow at JupyterCon

2017-09-06 Thread Wes McKinney
Thanks Gary, that is helpful context. In light if this, it might be worth writing some kind of a proposal for how to enable the Java vector classes to be backed by some other kind of byte buffers. It might be that an alternative version of portions of the Arrow Java library (i.e. decoupled from Net

Re: Apache Arrow at JupyterCon

2017-09-06 Thread Jacques Nadeau
This is a interesting problem but also pretty complex. Arrow's Java memory management model is complex on purpose (see https://github.com/apache/arrow/blob/master/java/memory/src/main/java/org/apache/arrow/memory/README.md for more info). It is designed to reserve and share memory in multiple hiera

Re: Apache Arrow at JupyterCon

2017-09-07 Thread Gonzalo Ortiz Jaureguizar
On a library like Arrow it also is very important to have the less dynamic methods call on the critical paths (get/puts). If it is decided to supports other memory systems, it is important to try to minimize that as much as possible. If there is a single vector class that supports both systems (by

Re: Apache Arrow at JupyterCon

2017-09-07 Thread Jacques Nadeau
Our general goal (which hasn't always been succesfully implemented) is what I'd describe as "fractured subclassing". You can see our use of this where ArrowBuf may extend various Netty classes but is interacting directly with memory addresses for all the hot path get/set operations (not delegating

Re: Apache Arrow at JupyterCon

2017-09-07 Thread Gang(Gary) Wang
Yes, the performance is critical for most of the big data applications that is one of key success factors for both of Arrow and Mnemonic. a performance-oriented engineer might even against fundamental design patterns for performance. so the problem is how can we make their lives easier? from my poi