Just want to give some updates on the dispatching.
Now we has workable runtime functionality include dispatch mechanism[1][2] and
build framework for both the compute kernels and other parts of C++. There are
some remaining SIMD static complier code under the code base that I will try to
work
With regards to scale, my colleague discovered some inconsistencies and filed a
JIRA with a proposed fix (a PR should be attached shortly).
I think this is an edge case that should be fixed but if someone with more
historical context has opinions, I'd like to here them.
[1]
Hi Radu,
This is a conversation best had on dev@parquet. It came up recently [1]
and I cross-posted there as well.
[1]
https://lists.apache.org/thread.html/re4fe4bc80c9eadd446761588f9b03d827193f91269a7c14ce0c444dd%40%3Cdev.arrow.apache.org%3E
On Thu, Sep 3, 2020 at 3:20 PM Radu Teodorescu
Hello,
What is the current thinking around allowing the logical content of a parquet
file to be split across multiple files?
I see that in theory there is support for reading files where different row
groups are in separate files but I cannot see any features that allow that for
writing.
On a
I am working on an engine for processing timeseries data. Unsurprisingly
for such a system, values of timestamp type feature prominently and we need
basic support for them in DataFusion.
Initially, we want to use DataFusion with predicates such as '=', '<', '>',
etc on timestamp columns and
It would be useful for outsiders to expose what those two API levels
are, and to what usage they correspond.
Is Parquet encryption used only with that Spark? While Spark
interoperability is important, Parquet files are more ubiquitous than that.
Regards
Antoine.
Le 03/09/2020 à 22:31, Gidon
Why would the low level API be exposed directly.. This will break the
interop between the two analytic ecosystems down the road.
Again, let me suggest leveraging the high level interface, based on the
PropertiesDrivenCryptoFactory.
It should address your technical requirements; if it doesn't, we
Hi Itamar,
I implemented some python wrappers for the low level API and would be happy to
collaborate on that. The reason I didn't push this forward yet is what Gidon
mentioned. The API to expose to python users needs to be finalized first and it
must include the key tools API for interop with
On Thu, Sep 3, 2020, at 11:01 AM, Antoine Pitrou wrote:
>
> Hi Gidon,
>
> Le 03/09/2020 à 16:53, Gidon Gershinsky a écrit :
> > Hi Itamar,
> >
> > My suggestion would be wrap a different API in Python - the high-level
> > encryption interface of
> > https://github.com/apache/arrow/pull/8023
>
Hi Antoine,
Sounds good to me. This PR is already being actively reviewed, and it'd be
good to have Itamar's assessment.
Cheers, Gidon
On Thu, Sep 3, 2020 at 6:01 PM Antoine Pitrou wrote:
>
> Hi Gidon,
>
> Le 03/09/2020 à 16:53, Gidon Gershinsky a écrit :
> > Hi Itamar,
> >
> > My
Hi Gidon,
Le 03/09/2020 à 16:53, Gidon Gershinsky a écrit :
> Hi Itamar,
>
> My suggestion would be wrap a different API in Python - the high-level
> encryption interface of
> https://github.com/apache/arrow/pull/8023
We need a strategy for reviewing those changes. The PR is quite large,
Hi Itamar,
My suggestion would be wrap a different API in Python - the high-level
encryption interface of
https://github.com/apache/arrow/pull/8023
This will enable interoperability with Apache Spark (and other frameworks),
where we don't expose the low level parquet encryption API.
If such a
There are various open source columnar database engines you could look
at to get inspiration for a varargs variant of sort_indices.
On Thu, Sep 3, 2020 at 9:26 AM Ben Kietzman wrote:
>
> Hi Rares,
>
> The arrow API does not currently support sorting against multiple columns.
> We'd welcome a
Hi Rares,
The arrow API does not currently support sorting against multiple columns.
We'd welcome a JIRA/PR to add that support.
One potential workaround is storing the tuple as a single column of
fixed_size_list(int32, 2), which could then be viewed [1] as int64 (for
which sorting
is
Hi,
I'm looking into implementing this, and it seems like there are two parts:
packaging, but also wrapping the APIs in Python. Is the latter item accurate?
If so, any examples of similar existing wrapped APIs, or should I just come up
with something on my own?
Context:
The C++/Python authentication implementation is entirely different
(because the C++/Python/Java gRPC APIs are in turn entirely
different). In particular, gRPC middleware in C++ is still
experimental (compared to Java) and much more limited (unless recent
versions changed this). C++/Python might
Thanks for sharing! It's cool to see the new PyFileSystem directly being
used ;)
Note that there is also an fsspec-compatible Azule filesystem
implementation that should support Data Lake Gen2 (
https://github.com/dask/adlfs) for another python-based implemenation, and
which can be used with
Hello,
I have a set of integer tuples that need to be collected and sorted at a
coordinator. Here is an example with tuples of length 2:
[(1, 10),
(1, 15),
(2, 10),
(2, 15)]
I am considering storing each column in an Arrow array, e.g., [1, 1, 2, 2]
and [10, 15, 10, 15], and have the Arrow
Arrow Build Report for Job nightly-2020-09-03-0
All tasks:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-09-03-0
Failed Tasks:
- test-conda-python-3.7-hdfs-2.9.2:
URL:
19 matches
Mail list logo