[jira] [Created] (ARROW-3549) Replace i64 with usize for some bit utility functions

2018-10-17 Thread Chao Sun (JIRA)
Chao Sun created ARROW-3549: --- Summary: Replace i64 with usize for some bit utility functions Key: ARROW-3549 URL: https://issues.apache.org/jira/browse/ARROW-3549 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-3548) Speed up storing small objects in the object store.

2018-10-17 Thread Robert Nishihara (JIRA)
Robert Nishihara created ARROW-3548: --- Summary: Speed up storing small objects in the object store. Key: ARROW-3548 URL: https://issues.apache.org/jira/browse/ARROW-3548 Project: Apache Arrow

Re: Making a bugfix 0.11.1 release

2018-10-17 Thread Wes McKinney
Got it, thank you for clarifying. It wasn't clear whether the bug would occur in the build environment (CentOS 5 + devtoolset-2) as well as other Linux environments. On Wed, Oct 17, 2018 at 4:16 PM Antoine Pitrou wrote: > > > Le 17/10/2018 à 20:38, Wes McKinney a écrit : > > hi folks, > > > > Sinc

Re: Making a bugfix 0.11.1 release

2018-10-17 Thread Antoine Pitrou
Le 17/10/2018 à 20:38, Wes McKinney a écrit : > hi folks, > > Since the Python wheels are being installed 10,000 times per day or > more, I don't think we should allow them to be broken for much longer. > > What additional patches need to be done before an RC can be cut? Since > I'm concerned a

[jira] [Created] (ARROW-3547) [R] Protect against Null crash when reading from RecordBatch

2018-10-17 Thread Javier Luraschi (JIRA)
Javier Luraschi created ARROW-3547: -- Summary: [R] Protect against Null crash when reading from RecordBatch Key: ARROW-3547 URL: https://issues.apache.org/jira/browse/ARROW-3547 Project: Apache Arrow

Re: Making a bugfix 0.11.1 release

2018-10-17 Thread Wes McKinney
Seems like perhaps we should create a Dockerized environment for testing the wheels. I opened https://issues.apache.org/jira/browse/ARROW-3546 On Wed, Oct 17, 2018 at 2:38 PM Wes McKinney wrote: > > hi folks, > > Since the Python wheels are being installed 10,000 times per day or > more, I don't

[jira] [Created] (ARROW-3546) [Python] Provide testing setup to verify wheel binaries work in one or more common Linux distributions

2018-10-17 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3546: --- Summary: [Python] Provide testing setup to verify wheel binaries work in one or more common Linux distributions Key: ARROW-3546 URL: https://issues.apache.org/jira/browse/ARROW-3546

Re: Making a bugfix 0.11.1 release

2018-10-17 Thread Wes McKinney
hi folks, Since the Python wheels are being installed 10,000 times per day or more, I don't think we should allow them to be broken for much longer. What additional patches need to be done before an RC can be cut? Since I'm concerned about the broken patches undermining the project's reputation,

[jira] [Created] (ARROW-3545) [C++/Python] Normalize child/field terminology with StructType

2018-10-17 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3545: --- Summary: [C++/Python] Normalize child/field terminology with StructType Key: ARROW-3545 URL: https://issues.apache.org/jira/browse/ARROW-3545 Project: Apache Arrow

[jira] [Created] (ARROW-3544) [Gandiva] Extremely long compile time for function_registry.cc in release mode on clang 6

2018-10-17 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3544: --- Summary: [Gandiva] Extremely long compile time for function_registry.cc in release mode on clang 6 Key: ARROW-3544 URL: https://issues.apache.org/jira/browse/ARROW-3544

Re: [Discuss] Monorepo vs. independent repositories for independent implementations

2018-10-17 Thread Wes McKinney
I see. This isn't a supported use case for the project -- we expect third parties to use released source or binary artifacts. On Wed, Oct 17, 2018 at 1:24 PM Francois Saint-Jacques wrote: > > Not the nesting, but pulling a lot of unused files. > > On Wed, Oct 17, 2018 at 12:39 PM Wes McKinney wro

Re: [Discuss] Monorepo vs. independent repositories for independent implementations

2018-10-17 Thread Francois Saint-Jacques
Not the nesting, but pulling a lot of unused files. On Wed, Oct 17, 2018 at 12:39 PM Wes McKinney wrote: > Why would one level of directory nesting cause awkwardness (curious)? > > On Wed, Oct 17, 2018, 12:28 PM Francois Saint-Jacques < > fsaintjacq...@networkdump.com> wrote: > >> One point towa

Re: [Discuss] Monorepo vs. independent repositories for independent implementations

2018-10-17 Thread Wes McKinney
Why would one level of directory nesting cause awkwardness (curious)? On Wed, Oct 17, 2018, 12:28 PM Francois Saint-Jacques < fsaintjacq...@networkdump.com> wrote: > One point toward seperate repositories, vendoring Arrow for C++ project > with git submodules becomes awkward if it's a multi-lang

Re: [Discuss] Monorepo vs. independent repositories for independent implementations

2018-10-17 Thread Francois Saint-Jacques
One point toward seperate repositories, vendoring Arrow for C++ project with git submodules becomes awkward if it's a multi-lang monorepo. On Tue, Oct 16, 2018 at 9:22 PM Wes McKinney wrote: > I would also add -- Krisztian's recent work Dockerizing the project is > setting us up to be able to de

Re: Arrow sync at 12:00 US/Eastern today

2018-10-17 Thread Jacques Nadeau
Notes from meeting below. *Attendees* Jacques: no items Uwe: merging documentation Wes: Bug fix release Pearu: no items Li: Java issues *Merging Documentation* Let’s look at merging things and using Sphinx across languages. *Bug fix release* Bug fix release to fix packages for python. Need to ma

[jira] [Created] (ARROW-3543) crazy timestamp bug in feather?

2018-10-17 Thread Olaf (JIRA)
Olaf created ARROW-3543: --- Summary: crazy timestamp bug in feather? Key: ARROW-3543 URL: https://issues.apache.org/jira/browse/ARROW-3543 Project: Apache Arrow Issue Type: Bug Reporter: Ola

Arrow sync at 12:00 US/Eastern today

2018-10-17 Thread Wes McKinney
https://meet.google.com/vtm-teks-phx Unfortunately there is also a Parquet sync at the same time. Not sure what to do about that

Re: Algorithmic explorations of bitmaps vs. sentinel values

2018-10-17 Thread Ted Dunning
Memory binding can be viewed as opportunity for melding multiple aggregators. For instance, any additional aggregation comes nearly for free. Sum and count (non zero) will be the same as either alone. Or sum and sum of squares. On Wed, Oct 17, 2018, 06:21 Francois Saint-Jacques < fsaintjacq..

[jira] [Created] (ARROW-3542) [C++] Use unsafe appends when building array from CSV

2018-10-17 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3542: - Summary: [C++] Use unsafe appends when building array from CSV Key: ARROW-3542 URL: https://issues.apache.org/jira/browse/ARROW-3542 Project: Apache Arrow

Re: Wrapping java.nio.ByteBuffer into ArrowBuf

2018-10-17 Thread Wes McKinney
hi, Yes, this should be possible. There is some internal memory accounting that needs to be addressed https://issues.apache.org/jira/browse/ARROW-3191 see prior mailing list discussion about this for more https://lists.apache.org/thread.html/150a0ca92a958420175f1adb559e1a91e85d54d7f5c95477202ca

Wrapping java.nio.ByteBuffer into ArrowBuf

2018-10-17 Thread Shubham Chaurasia
Hi All, Is there any a to wrap java.nio.ByteBuffer into ArrowBuf and I want to use org.apache.arrow.memory.BufferAllocator at the same time. I know that netty provides a way to wrap(obtain a view of buffer without actually copying it's contents) via io.netty.buffer.Unpooled#wrappedBuffer(ByteBuff

Re: Algorithmic explorations of bitmaps vs. sentinel values

2018-10-17 Thread Francois Saint-Jacques
Don't trust that the compiler will auto-vectorize, tiny changes can have drastic impacts. SumScalar (benchmark-native): │ state->total += *values++; 42 │ add(%rcx),%esi 68 │ mov%esi,(%rsp) 3 │ add0x4(%rcx),%esi 36 │ mov%esi,(%rsp) │ ad

[jira] [Created] (ARROW-3541) [Rust] Update BufferBuilder to allow new bit-packed BooleanArray

2018-10-17 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3541: -- Summary: [Rust] Update BufferBuilder to allow new bit-packed BooleanArray Key: ARROW-3541 URL: https://issues.apache.org/jira/browse/ARROW-3541 Project: Apache Arrow

Re: Algorithmic explorations of bitmaps vs. sentinel values

2018-10-17 Thread Wes McKinney
I'm surprised that using a stack variable has an impact, but I should probably update the benchmarks to do that (and merge with the SumState at the end of the function) for thoroughness. Thanks! On Wed, Oct 17, 2018 at 9:07 AM Francois Saint-Jacques wrote: > > It seems the code for the naive Scala

Re: Algorithmic explorations of bitmaps vs. sentinel values

2018-10-17 Thread Francois Saint-Jacques
It seems the code for the naive Scalar example is not friendly with the compiler auto-vectorization component. If you accumulate in a local state (instead of SumState pointer), you'll get different results. at least with clang++6.0. benchmark-noavx (only SSE): BM_SumInt32Scalar

[jira] [Created] (ARROW-3540) [Rust] Incorporate BooleanArray into PrimitiveArray

2018-10-17 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3540: -- Summary: [Rust] Incorporate BooleanArray into PrimitiveArray Key: ARROW-3540 URL: https://issues.apache.org/jira/browse/ARROW-3540 Project: Apache Arrow Issue Ty

[jira] [Created] (ARROW-3539) [Packaging] Update scripts to build against vendored jemalloc

2018-10-17 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-3539: -- Summary: [Packaging] Update scripts to build against vendored jemalloc Key: ARROW-3539 URL: https://issues.apache.org/jira/browse/ARROW-3539 Project: Apache Arrow