Hi Antoine,
I think Liya Fan raised some good points in his reply but I'd like to
answer your questions directly.
> So the question is whether this really needs to be in the in-memory
> format, i.e. is it desired to operate directly on this compressed
> format, or is it solely for transport?
I
Hi Razvan,
I'm not sure about plans around tensors. However, depending on how you are
trying to transfer the data and consume it, you might consider using an
extension type [1]. For the physical representation you could model it as
something like:
{
RowLabel : Date32/64
ColumnLabels :
Sutou Kouhei created ARROW-5943:
---
Summary: [GLib][Gandiva] Add support for function aliases
Key: ARROW-5943
URL: https://issues.apache.org/jira/browse/ARROW-5943
Project: Apache Arrow
Issue
Todd Hay created ARROW-5942:
---
Summary: [JS] Implement Tensor Type
Key: ARROW-5942
URL: https://issues.apache.org/jira/browse/ARROW-5942
Project: Apache Arrow
Issue Type: New Feature
Hi,
I've created pull requests that were used to release 0.14.0:
ARROW-5937: [Release] Stop parallel binary upload
https://github.com/apache/arrow/pull/4868
ARROW-5938: [Release] Create branch for adding release note automatically
https://github.com/apache/arrow/pull/4869
ARROW-5939: [Release]
Sutou Kouhei created ARROW-5941:
---
Summary: [Release] Avoid re-uploading already uploaded binary
artifacts
Key: ARROW-5941
URL: https://issues.apache.org/jira/browse/ARROW-5941
Project: Apache Arrow
Sutou Kouhei created ARROW-5940:
---
Summary: [Release] Add support for re-uploading sign/checksum for
binary artifacts
Key: ARROW-5940
URL: https://issues.apache.org/jira/browse/ARROW-5940
Project:
Sutou Kouhei created ARROW-5939:
---
Summary: [Release] Add support for generating vote email template
separately
Key: ARROW-5939
URL: https://issues.apache.org/jira/browse/ARROW-5939
Project: Apache
Hi, folks
We were discussing improvements for the threading engine back in May and agreed
to implement benchmarks (sorry, I've lost the original mail thread, here is the
link:
Sutou Kouhei created ARROW-5938:
---
Summary: [Release] Create branch for adding release note
automatically
Key: ARROW-5938
URL: https://issues.apache.org/jira/browse/ARROW-5938
Project: Apache Arrow
Sutou Kouhei created ARROW-5937:
---
Summary: [Release] Stop parallel binary upload
Key: ARROW-5937
URL: https://issues.apache.org/jira/browse/ARROW-5937
Project: Apache Arrow
Issue Type:
Sure. I'd like to bundle an M x N shaped tensor along with the M row labels
(dates) and N column labels (string identifiers) in one response.
Razvan
On Fri, Jul 12, 2019, 6:53 PM Wes McKinney wrote:
> hi Razvan -- can you clarify what "together with a row and a column
> index? means?
>
> On
Benjamin Kietzman created ARROW-5936:
Summary: [C++] [FlightRPC] user_metadata is not present in fields
read from flight
Key: ARROW-5936
URL: https://issues.apache.org/jira/browse/ARROW-5936
Benjamin Kietzman created ARROW-5935:
Summary: [C++] ArrayBuilders with mutable type are not robustly
supported
Key: ARROW-5935
URL: https://issues.apache.org/jira/browse/ARROW-5935
Project:
Krisztian Szucs created ARROW-5934:
--
Summary: [Python] Bundle arrow's LICENSE with the wheels
Key: ARROW-5934
URL: https://issues.apache.org/jira/browse/ARROW-5934
Project: Apache Arrow
Thanks all, this is helpful and I've added
https://issues.apache.org/jira/browse/ARROW-5933 to improve the
documentation for future developers.
On Wed, Jul 10, 2019 at 11:09 PM Jacques Nadeau wrote:
> I was also supportive of this pattern. We definitely have used it before to
> optimize in
Benjamin Kietzman created ARROW-5933:
Summary: [C++] [Documentation] add discussion of Union.typeIds to
Layout.rst
Key: ARROW-5933
URL: https://issues.apache.org/jira/browse/ARROW-5933
Project:
Thanks for collecting them!
We should also run the packaging tasks on them before cutting RC0.
On Fri, Jul 12, 2019 at 8:28 PM Wes McKinney wrote:
> I updated https://gist.github.com/wesm/1e4ac14baaa8b27bf13b071d2d715014
> to include all the cited patches, as well as the Parquet forward
>
I updated https://gist.github.com/wesm/1e4ac14baaa8b27bf13b071d2d715014
to include all the cited patches, as well as the Parquet forward
compatibility fix.
I'm waiting on CI to be able to pass ARROW-5921 (fuzzing-discovered
IPC crash) and the ARROW-5889 (Parquet backwards compatibility with
0.13)
Cong Ding created ARROW-5932:
Summary: undefined reference to
`__cxa_init_primary_exception@CXXABI_1.3.11'
Key: ARROW-5932
URL: https://issues.apache.org/jira/browse/ARROW-5932
Project: Apache Arrow
Wes McKinney created ARROW-5931:
---
Summary: [C++] Extend extension types facility to provide for
serialization and deserialization in IPC roundtrips
Key: ARROW-5931
URL:
hi Razvan -- can you clarify what "together with a row and a column
index? means?
On Fri, Jul 12, 2019 at 11:17 AM Razvan Chitu wrote:
>
> Hi,
>
> Does the IPC format currently support streaming a tensor together with a
> row and a column index? If not, are there any plans for this to be
>
There's also ARROW-5921 (I tagged it 0.14.1) if it passes travis. This
one fixes a segfault found via fuzzing.
François
On Fri, Jul 12, 2019 at 6:54 AM Krisztián Szűcs
wrote:
>
> PRs touching the wheel packaging scripts:
> - https://github.com/apache/arrow/pull/4828 (lz4)
> -
lidavidm created ARROW-5930:
---
Summary: [FlightRPC] [Python] Flight CI tests are failing
Key: ARROW-5930
URL: https://issues.apache.org/jira/browse/ARROW-5930
Project: Apache Arrow
Issue Type: Bug
Hi,
Does the IPC format currently support streaming a tensor together with a
row and a column index? If not, are there any plans for this to be
supported? It'd be quite a useful for matrices that could have 10s of
thousands of either rows, columns or both. For my use case I am currently
hi Liya -- yes, it seems reasonable to defer the conversion from your
pointer-based extension representation to a proper VarCharVector until
you need to send over IPC.
Note that there is no mechanism yet in Java with extension types to
cause a conversion to take place when the IPC step is
Wes McKinney created ARROW-5929:
---
Summary: [Java] Define API for ExtensionVector whose data must be
serialized prior to being sent via IPC
Key: ARROW-5929
URL: https://issues.apache.org/jira/browse/ARROW-5929
Wes McKinney created ARROW-5928:
---
Summary: [JS] Test fuzzer inputs
Key: ARROW-5928
URL: https://issues.apache.org/jira/browse/ARROW-5928
Project: Apache Arrow
Issue Type: Improvement
Wes McKinney created ARROW-5927:
---
Summary: [Go] Test fuzzer inputs
Key: ARROW-5927
URL: https://issues.apache.org/jira/browse/ARROW-5927
Project: Apache Arrow
Issue Type: Improvement
Wes McKinney created ARROW-5926:
---
Summary: [Java] Test fuzzer inputs
Key: ARROW-5926
URL: https://issues.apache.org/jira/browse/ARROW-5926
Project: Apache Arrow
Issue Type: Improvement
Pindikura Ravindra created ARROW-5925:
-
Summary: [Gandiva][C++] cast decimal to int should round up
Key: ARROW-5925
URL: https://issues.apache.org/jira/browse/ARROW-5925
Project: Apache Arrow
shengjun.li created ARROW-5924:
--
Summary: [C++][Plasma] It is not convenient to release a GPU object
Key: ARROW-5924
URL: https://issues.apache.org/jira/browse/ARROW-5924
Project: Apache Arrow
Francois Saint-Jacques created ARROW-5923:
-
Summary: [C++] Fix int96 comment
Key: ARROW-5923
URL: https://issues.apache.org/jira/browse/ARROW-5923
Project: Apache Arrow
Issue Type:
Saurabh Bajaj created ARROW-5922:
Summary: Unable to connect to HDFS from a worker/data node on a
Kerberized cluster using pyarrow' hdfs API
Key: ARROW-5922
URL: https://issues.apache.org/jira/browse/ARROW-5922
Marco Neumann created ARROW-5921:
Summary: [C++][Fuzzing] Missing nullptr checks in IPC
Key: ARROW-5921
URL: https://issues.apache.org/jira/browse/ARROW-5921
Project: Apache Arrow
Issue
PRs touching the wheel packaging scripts:
- https://github.com/apache/arrow/pull/4828 (lz4)
- https://github.com/apache/arrow/pull/4833 (uriparser - only if
https://github.com/apache/arrow/commit/88fcb096c4f24861bc7f8181cba1ad8be0e4048a
is cherry picked as well)
-
@Antoine Pitrou,
Good question. I think the answer depends on the concrete encoding scheme.
For some encoding schemes, it is not a good idea to use them for in-memory
data compression.
For others, it is beneficial to operator directly on the compressed data.
For example, it is beneficial to
Liya Fan created ARROW-5920:
---
Summary: [Java] Support sort & compare for all variable width
vectors
Key: ARROW-5920
URL: https://issues.apache.org/jira/browse/ARROW-5920
Project: Apache Arrow
Thanks François, I closed PARQUET-1623 this morning. It would be nice to
include the PR in the patch release:
https://github.com/apache/arrow/pull/4857
This bug has been around for a few releases but I think it should be a low risk
change to include.
Hatem
On 7/12/19, 2:27 AM, "Francois
Le 12/07/2019 à 11:39, Uwe L. Korn a écrit :
> Actually the most pragmatic way I have thought of yet would be to use conda
> and build all our dependencies. Instead of using the compilers defaults and
> conda-forge use, we should build the dependencies in the manylinux image
> and then
Uwe L. Korn created ARROW-5919:
--
Summary: [R] Add nightly tests for building r-arrow with
dependencies from conda-forge
Key: ARROW-5919
URL: https://issues.apache.org/jira/browse/ARROW-5919
Project:
Hallo,
On Thu, Jul 11, 2019, at 9:51 PM, Wes McKinney wrote:
> On Thu, Jul 11, 2019 at 11:26 AM Antoine Pitrou wrote:
> >
> >
> > Le 11/07/2019 à 17:52, Krisztián Szűcs a écrit :
> > > Hi All,
> > >
> > > I have a couple of questions about the wheel packaging:
> > > - why do we build an arrow
Le 12/07/2019 à 09:56, Micah Kornfield a écrit :
> Per Antoine's recommendation. I'm splitting off the discussion about data
> integrity from the previous e-mail thread about the format additions [1].
> To re-cap I made a proposal including data integrity [2] by adding a new
> message type to
Le 12/07/2019 à 10:08, Micah Kornfield a écrit :
> OK, I've created a separate thread for data integrity/digests [1], and
> retitled this thread to continue the discussion on compression and
> encodings. As a reminder the PR for the format additions [2] suggested a
> new SparseRecordBatch that
Liya Fan created ARROW-5918:
---
Summary: [Java] Revise the BaseIntVector interface
Key: ARROW-5918
URL: https://issues.apache.org/jira/browse/ARROW-5918
Project: Apache Arrow
Issue Type: Improvement
Liya Fan created ARROW-5917:
---
Summary: [Java] Redesign the dictionary encoder
Key: ARROW-5917
URL: https://issues.apache.org/jira/browse/ARROW-5917
Project: Apache Arrow
Issue Type: New Feature
OK, I've created a separate thread for data integrity/digests [1], and
retitled this thread to continue the discussion on compression and
encodings. As a reminder the PR for the format additions [2] suggested a
new SparseRecordBatch that would allow for the following features:
1. Different data
I think it would be worthwhile to split the discussion into two separate
threads. One thread for compression & encodings (which are related or
even the same topic), one thread for data integrity.
Regards
Antoine.
Le 08/07/2019 à 07:22, Micah Kornfield a écrit :
>
> - Compression:
>*
48 matches
Mail list logo