Kouhei Sutou created ARROW-7260:
---
Summary: [CI] Ubuntu 14.04 test is failed by user defined literal
Key: ARROW-7260
URL: https://issues.apache.org/jira/browse/ARROW-7260
Project: Apache Arrow
I
OK. I submitted a pull request: https://github.com/apache/arrow/pull/5901
In
"Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-11-25-0" on Mon, 25
Nov 2019 21:23:34 -0600,
Wes McKinney wrote:
> I'd be interested to maintain gcc 4.8 support for a time yet but I'm
> interested in the op
Joris Van den Bossche created ARROW-7261:
Summary: [Python] Python support for fixed size list type
Key: ARROW-7261
URL: https://issues.apache.org/jira/browse/ARROW-7261
Project: Apache Arrow
Projjal Chanda created ARROW-7262:
-
Summary: [C++][Gandiva] Implement replace function in Gandiva
Key: ARROW-7262
URL: https://issues.apache.org/jira/browse/ARROW-7262
Project: Apache Arrow
I
Projjal Chanda created ARROW-7263:
-
Summary: [C++][Gandiva] Implement locate and position functions
Key: ARROW-7263
URL: https://issues.apache.org/jira/browse/ARROW-7263
Project: Apache Arrow
+1 (binding)
In
"[VOTE] Clarifications and forward compatibility changes for Dictionary
Encoding (second iteration)" on Wed, 20 Nov 2019 20:41:57 -0800,
Micah Kornfield wrote:
> Hello,
> As discussed on [1], I've proposed clarifications in a PR [2] that
> clarifies:
>
> 1. It is not requ
Hi Micah,
Le 26/11/2019 à 05:52, Micah Kornfield a écrit :
>
> After going through this exercise I put together a list of pros and cons
> below.
>
> I would like to hear from other devs:
> 1. Their opinions on setting this up as an alternative system (I'm willing
> to invest some more time in
Ji Liu created ARROW-7264:
-
Summary: [Java] RangeEqualsVisitor type check is not correct
Key: ARROW-7264
URL: https://issues.apache.org/jira/browse/ARROW-7264
Project: Apache Arrow
Issue Type: Bug
Hi all,
Recently the datasets API has been improved a lot and I found some of the new
features are very useful to my own work. For example to me a important one is
the fix of ARROW-6952[1]. And as I currently work on Java/Scala projects like
Spark, I am now investigating a way to call some of
Arrow Build Report for Job nightly-2019-11-26-0
All tasks:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-26-0
Failed Tasks:
- test-conda-python-2.7-pandas-master:
URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-26-0-circle-test-conda-
Hi Arrow devs,
Small intro: I'm the main Vaex developer, an out of core dataframe
library for Python - https://github.com/vaexio/vaex -, and we're
looking into moving Vaex to use Apache Arrow for the data structure.
At the beginning of this year, we added string support in Vaex, which
required 64
Francois Saint-Jacques created ARROW-7265:
-
Summary: [Format][C++] Clarify the usage of typeIds in Union type
documentation
Key: ARROW-7265
URL: https://issues.apache.org/jira/browse/ARROW-7265
It seems that the array_union_test.cc does the latter, look at how
`expected_types` is constructed. I opened
https://issues.apache.org/jira/browse/ARROW-7265 .
Wes, is the intended usage of type_ids to allow a producer to pass a
subset columns of unions without modifying the type codes?
François
hi Maarten
I opened https://issues.apache.org/jira/browse/ARROW-7245 in part based on this.
I think that normalizing to a common type (which would require casting
the offsets buffer, but not the data -- which can be shared -- so not
too wasteful) during concatenation would be the approach I would
Hi,
In https://github.com/dask/dask/issues/5526, we're seeing an issue stemming
from a hack to ensure compatibility for Pyarrow. The details aren't too
important. The core of the issue is that the Pyarrow parquet writer makes a
couple checks for `FileSystem._isfilestore` via `_mkdir_if_not_exists`
Adam Hooper created ARROW-7266:
--
Summary: dictionary_encode() of a slice gives wrong result
Key: ARROW-7266
URL: https://issues.apache.org/jira/browse/ARROW-7266
Project: Apache Arrow
Issue Type
Op di 26 nov. 2019 om 15:02 schreef Wes McKinney :
> hi Maarten
>
> I opened https://issues.apache.org/jira/browse/ARROW-7245 in part based
> on this.
>
> I think that normalizing to a common type (which would require casting
> the offsets buffer, but not the data -- which can be shared -- so not
In vaex I always write the data to hdf5 as 1 large chunk (per column).
The reason is that it allows the mmapped columns to be exposed as a
single numpy array (talking numerical data only for now), which many
people are quite comfortable with.
The strategy for vaex to write unchunked data, is to fi
I'd rather drop 14.04 rather than spend some time maintaining kludges
for old compilers.
Regards
Antoine.
On Tue, 26 Nov 2019 17:24:58 +0900 (JST)
Sutou Kouhei wrote:
> OK. I submitted a pull request: https://github.com/apache/arrow/pull/5901
>
> In
> "Re: [NIGHTLY] Arrow Build Report f
Hello Maarten,
In theory, you could provide a custom mmap-allocator and use the
builder facility. Since the array is still in "build-phase" and not
sealed, it should be fine if mremap changes the pointer address. This
might fail in practice since the allocator is also used for auxiliary
data, e.g.
Generally speaking, this API is obsolete (though not formally deprecated
yet). So we don't envision to change it significantly in the future.
We hope that in the near future the near pyarrow FileSystem API will be
usable directly pyarrow.parquet.
Regards
Antoine.
Le 26/11/2019 à 15:34, Tom
Antoine Pitrou created ARROW-7267:
-
Summary: [CI] [C++] Tests not run on "AMD64 Windows 2019 C++"
Key: ARROW-7267
URL: https://issues.apache.org/jira/browse/ARROW-7267
Project: Apache Arrow
I
Thanks for all the answers. The assumptions about union types in C++
code are fixed in https://github.com/apache/arrow/pull/5892
Regards
Antoine.
Le 25/11/2019 à 16:41, Wes McKinney a écrit :
> On Mon, Nov 25, 2019 at 9:25 AM Antoine Pitrou wrote:
>>
>> On Mon, 25 Nov 2019 09:12:21 -0600
>>
hi Hongze,
The Datasets functionality is indeed extremely useful, and it may make
sense to have it available in many languages eventually. With Java, I
would raise the issue that things are comparatively weaker there when
it comes to actually reading the files themselves. Whereas we have
reasonabl
OK, so the proposal is not only to drop support for Ubuntu 14.04 but
also to stop supporting gcc < 4.9, is that right? Since manylinux1 is
gcc 4.8.5 as long as the _libraries_ build then that is okay. I don't
know what the implications of dropping manylinux1 (in favor of
manylinux2010) would be
On
Martin Grund created ARROW-7268:
---
Summary: Propagate `custom_metadata` field from IPC message
Key: ARROW-7268
URL: https://issues.apache.org/jira/browse/ARROW-7268
Project: Apache Arrow
Issue T
Hi Antoine,
For Java, the physical child id is the same as the logical type code, as
the index of each child vector is the code (ordinal) of the vector's minor
type.
This leads to a problem, that only a single vector for each type can exist
in a union vector, so strictly speaking, the Java impleme
Hi Hongze,
To add to Wes's point, there are already some efforts to do JNI for ORC
(which needs to be integrated with CI) and some open PRs for Parquet in the
project. However, given that you are using Spark I would expect there is
already dataset functionality that is equivalent to the dataset AP
Hi Antoine,
> My question would be: what happens after the PR is merged? Are
> developers supposed to keep the Bazel setup working in addition to
> CMake? Or is there a dedicated maintainer (you? :-)) to fix regressions
> when they happen?
In the short term, I would be will to be a dedicated m
The vote carries with 3 bindings votes +1 votes, 1 non-binding +1 vote and
1 non-binding +.5 vote.
To follow-up I will:
1. Open up JIRAs for work items in reference implementations (c++/java)
2. Merge the pull request containing the specification changes.
Thanks,
Micah
On Tue, Nov 26, 2019 at
Hi Wes and Micah,
Thanks for your kindly reply.
Micah: We don't use Spark (vectorized) parquet reader because it is a pure Java
implementation. Performance could be worse than doing the similar work
natively. Another reason is we may need to
integrate some other specific data sources with Arr
31 matches
Mail list logo