Re: [Rust] Dictionary encoding and Flight

2020-04-07 Thread Wes McKinney
As another item for consideration -- in C++ at least, the dictionary id is dealt with as an internal detail of the IPC message production process. When serializing the Schema, id's are assigned to each dictionary-encoded field in the DictionaryMemo object, see

[jira] [Created] (ARROW-8369) [CI] Fix crossbow R group

2020-04-07 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8369: -- Summary: [CI] Fix crossbow R group Key: ARROW-8369 URL: https://issues.apache.org/jira/browse/ARROW-8369 Project: Apache Arrow Issue Type: Bug

Re: C interface clarifications

2020-04-07 Thread Wes McKinney
I opened a JIRA to track a potential change or at least clarification about this use case. One major use case for the C interface will be in database clients (e.g. this question arose out of using the C interface for Kudu -- a database) and this may be a common question.

[jira] [Created] (ARROW-8368) [Format] In C interface, clarify resource management for consumers needing only a subset of child fields in ArrowArray

2020-04-07 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8368: --- Summary: [Format] In C interface, clarify resource management for consumers needing only a subset of child fields in ArrowArray Key: ARROW-8368 URL:

[jira] [Created] (ARROW-8367) [C++] Is FromString(..., pool) worthwhile

2020-04-07 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-8367: --- Summary: [C++] Is FromString(..., pool) worthwhile Key: ARROW-8367 URL: https://issues.apache.org/jira/browse/ARROW-8367 Project: Apache Arrow Issue Type:

Re: [Rust] Dictionary encoding and Flight

2020-04-07 Thread Wes McKinney
hey Paul, Take a look at how dictionaries work in the IPC protocol https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst#serialization-and-interprocess-communication-ipc Dictionaries are sent as separate messages. When a field is tagged as dictionary encoded in the schema,

Re: C interface clarifications

2020-04-07 Thread Antoine Pitrou
Le 07/04/2020 à 19:39, Wes McKinney a écrit : > > Re-orienting the discussion on something more concrete, suppose that an > ArrowArray is used to convey a result set from a database query, and > suppose that the resources associated with each column in the result set > are independent of the

Re: C interface clarifications

2020-04-07 Thread Wes McKinney
On Tue, Apr 7, 2020, 12:04 PM Antoine Pitrou wrote: > > Le 07/04/2020 à 18:49, Todd Lipcon a écrit : > >> > >> Hmm, the spec may not be clear enough on this, but if you move a child > >> and release the parent, then the other children are not usable anymore. > >> > >> In your case, you don't

Re: C interface clarifications

2020-04-07 Thread Antoine Pitrou
Le 07/04/2020 à 18:49, Todd Lipcon a écrit : >> >> Hmm, the spec may not be clear enough on this, but if you move a child >> and release the parent, then the other children are not usable anymore. >> >> In your case, you don't call release() on every child. You just call >> release() on the

Re: C interface clarifications

2020-04-07 Thread Todd Lipcon
On Tue, Apr 7, 2020 at 2:40 AM Antoine Pitrou wrote: > > Le 06/04/2020 à 19:22, Todd Lipcon a écrit : > > > > The spec should also probably cover thread-safety: if the consumer gets > an > > ArrowArray, is it safe to pass off the children to multiple threads and > > have them call release()

Re: [C++] Compute: Datum and "ChunkedArray&" inputs

2020-04-07 Thread Uwe L. Korn
I did a bit more research on JIRA and we seem to have this open topic there also in https://issues.apache.org/jira/browse/ARROW-6959 which is the similar topic as my mail is about and in https://issues.apache.org/jira/browse/ARROW-7009 we wanted to remove some of the interfaces with

[Rust] Dictionary encoding and Flight

2020-04-07 Thread Paul Dix
Hello, I'm trying to build a Rust based Flight server and I'd like to use Dictionary encoding for a number of string columns in my data. I've seen that StringDictionary was recently added to Rust here:

[jira] [Created] (ARROW-8366) [Rust] Need to revert recent arrow-flight build change

2020-04-07 Thread Andy Grove (Jira)
Andy Grove created ARROW-8366: - Summary: [Rust] Need to revert recent arrow-flight build change Key: ARROW-8366 URL: https://issues.apache.org/jira/browse/ARROW-8366 Project: Apache Arrow Issue

[jira] [Created] (ARROW-8365) arrow-cpp: Error when writing files to S3 larger than 5 GB

2020-04-07 Thread Juan Galvez (Jira)
Juan Galvez created ARROW-8365: -- Summary: arrow-cpp: Error when writing files to S3 larger than 5 GB Key: ARROW-8365 URL: https://issues.apache.org/jira/browse/ARROW-8365 Project: Apache Arrow

[jira] [Created] (ARROW-8364) Get Access to the type_to_type_id dictionary

2020-04-07 Thread Or (Jira)
Or created ARROW-8364: - Summary: Get Access to the type_to_type_id dictionary Key: ARROW-8364 URL: https://issues.apache.org/jira/browse/ARROW-8364 Project: Apache Arrow Issue Type: Improvement

[jira] [Created] (ARROW-8363) [Archery] Comment bot should report any errors happening during crossbow submit

2020-04-07 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8363: -- Summary: [Archery] Comment bot should report any errors happening during crossbow submit Key: ARROW-8363 URL: https://issues.apache.org/jira/browse/ARROW-8363

[jira] [Created] (ARROW-8362) [Crossbow] Ensure that the locally generated version is used in the docker tasks

2020-04-07 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8362: -- Summary: [Crossbow] Ensure that the locally generated version is used in the docker tasks Key: ARROW-8362 URL: https://issues.apache.org/jira/browse/ARROW-8362

[C++] Compute: Datum and "ChunkedArray&" inputs

2020-04-07 Thread Uwe L. Korn
Hello all, I'm in the progress of changing the implementation of the Take kernel to work on ChunkedArrays without concatenating them into a single Array first. While working on the implementation, I realised that we switch often between Datum and the specific-typed parameters. This works quite

Re: Attn: Wes, Re: Masked Arrays

2020-04-07 Thread Felix Benning
I guess it would be helpful, when trying to achieve zero-modification between R and another language, if the standard used for communication would allow for that. Or when setting all nulls to zero for an algorithm and then saving it to a database for later use. But at the same time, I only

Re: C interface clarifications

2020-04-07 Thread Antoine Pitrou
Le 06/04/2020 à 19:22, Todd Lipcon a écrit : > > The spec should also probably cover thread-safety: if the consumer gets an > ArrowArray, is it safe to pass off the children to multiple threads and > have them call release() concurrently? In other words, do I need to use a > thread-safe

[jira] [Created] (ARROW-8361) [C++] Add Result APIs to Buffer methods and functions

2020-04-07 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8361: - Summary: [C++] Add Result APIs to Buffer methods and functions Key: ARROW-8361 URL: https://issues.apache.org/jira/browse/ARROW-8361 Project: Apache Arrow

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-04-05-0

2020-04-07 Thread Prudhvi Porandla
With https://github.com/apache/arrow/pull/6850 , gandiva-jar-osx build will succeed. gandiva-jar-trusty is failing because the deployment step is taking too long. The build is not failing on our env

[jira] [Created] (ARROW-8360) Fixes date32 support for date/time functions

2020-04-07 Thread Yuan Zhou (Jira)
Yuan Zhou created ARROW-8360: Summary: Fixes date32 support for date/time functions Key: ARROW-8360 URL: https://issues.apache.org/jira/browse/ARROW-8360 Project: Apache Arrow Issue Type: Bug