Hi there,
Just to let you know CERN has been accepted as a GSoC organization this
year.
As such, I have submitted a proposal that's loosely connected to Apache
Arrow (and Go.)
Here's the proposal:
https://hepsoftwarefoundation.org/gsoc/2019/proposal_GoHEPgroot.html
It's mostly about *using* Arr
Doing some light research it looks xxhash has better cross-platform support
as is faster then a vanilla implementation of crc32 [1]. However, crc32c
(a slightly different crc32 algorithm) is hardware accelerated on newer
(circa 2016) Intel CPUs [2] and is potentially faster.
[1] https://cyan4973.
Thanks Philipp,
Yeah, I probably shouldn't have said SHA1 either :)I'm not too
concerned with a particular hash/checksum implementation. It would be good
to have at least 1 or 2 well supported ones, and a migration path to
support more if necessary without breaking file/streaming formats for
Hey Micah,
in plasma, we are using xxhash to compute a hash/checksum [1] (it is
computed in parallel using multiple threads) and have good experience with
it -- all data in Ray is checksummed this way. Initially there were
problems with uninitialized bits in the arrow representation, but that has
(I meant to say SHA256 instead of SHA1)
On Tue, Mar 5, 2019 at 9:45 PM Philipp Moritz wrote:
> Hey Micah,
>
> in plasma, we are using xxhash to compute a hash/checksum [1] (it is
> computed in parallel using multiple threads) and have good experience with
> it -- all data in Ray is checksummed t
Hi Arrow Dev,
As we expand the use-cases for Arrow to move it more across system
boundaries (Flight) and make it live longer (e.g. in the file format), it
seems to make sense to build in a mechanism for data integrity verification
(e.g. a checksum like CRC32 or in some cases a cryptographic hash li
Micah Kornfield created ARROW-4784:
--
Summary: [C++][CI] Re-enable flaky mingw tests.
Key: ARROW-4784
URL: https://issues.apache.org/jira/browse/ARROW-4784
Project: Apache Arrow
Issue Type: B
Micah Kornfield created ARROW-4783:
--
Summary: [C++][CI] Mingw32 builds sometimes timeout
Key: ARROW-4783
URL: https://issues.apache.org/jira/browse/ARROW-4783
Project: Apache Arrow
Issue Typ
Wes McKinney created ARROW-4782:
---
Summary: [C++] Prototype scalar and array expression types for
developing deferred operator algebra
Key: ARROW-4782
URL: https://issues.apache.org/jira/browse/ARROW-4782
Paul Taylor created ARROW-4781:
--
Summary: [JS] Ensure empty data initializes empty typed arrays
Key: ARROW-4781
URL: https://issues.apache.org/jira/browse/ARROW-4781
Project: Apache Arrow
Issue
Paul Taylor created ARROW-4780:
--
Summary: [JS] Package sourcemap files, update default package JS
version
Key: ARROW-4780
URL: https://issues.apache.org/jira/browse/ARROW-4780
Project: Apache Arrow
Francois Saint-Jacques created ARROW-4779:
-
Summary: [CI] AppVeyor link failure
Key: ARROW-4779
URL: https://issues.apache.org/jira/browse/ARROW-4779
Project: Apache Arrow
Issue Type:
I am OK with that, but if we find ourselves making compromises that
affect performance or memory efficiency (where possibly invasive
refactoring may be required) perhaps we should reconsider option #3.
On Tue, Mar 5, 2019 at 11:29 AM Uwe L. Korn wrote:
>
> I'm leaning a bit towards 1) but I would
Uwe L. Korn created ARROW-4778:
--
Summary: [C++/Python] manylinux1: Update Thrift to 0.12.0
Key: ARROW-4778
URL: https://issues.apache.org/jira/browse/ARROW-4778
Project: Apache Arrow
Issue Type:
Uwe L. Korn created ARROW-4777:
--
Summary: [C++/Python] manylinux1: Update lz4 to 1.8.3
Key: ARROW-4777
URL: https://issues.apache.org/jira/browse/ARROW-4777
Project: Apache Arrow
Issue Type: Imp
+1 from me. Thanks for driving this discussion so we have the
rationale documented
On Tue, Mar 5, 2019 at 12:16 AM Micah Kornfield wrote:
>
> OK to summarize my understanding of the thoughts expressed:
> 1. People really shouldn't be trying to do things like grouping and
> joining on double valu
I'm leaning a bit towards 1) but I would love to get some input from the Avro
community as 1) depends also on their side as we will submit some patches
upstream that need to be reviewed and someday also released.
Are AVRO committers subscribed here or should we reach out to them on their ML?
Gi
I'd be +0.5 in favor of forking in this particular case. Since Avro is
not vectorized (unlike Parquet and ORC) I suspect it may be more
difficult to get the best performance using a general purpose API
versus one that is more specialized to producing Arrow record batches.
Given that has been relati
Francois Saint-Jacques created ARROW-4776:
-
Summary: [C++] DictionaryBuilder should support bootstrapping from
an existing dict type
Key: ARROW-4776
URL: https://issues.apache.org/jira/browse/ARROW-4776
Kenta Murata created ARROW-4775:
---
Summary: [Website] Site navbar cannot be expanded
Key: ARROW-4775
URL: https://issues.apache.org/jira/browse/ARROW-4775
Project: Apache Arrow
Issue Type: Bug
Stephen Gallagher created ARROW-4774:
Summary: Python crash writing nested array to parquet
Key: ARROW-4774
URL: https://issues.apache.org/jira/browse/ARROW-4774
Project: Apache Arrow
Iss
21 matches
Mail list logo