Hi François,
Thanks so much for the very detailed explanation, and that makes sense to
me. I will check out the links for more information.
@Wes,
ARROW-8250 is very useful to me as well and I will keep an eye on it.
Thanks.
On Wed, Jun 24, 2020 at 11:08 PM Wes McKinney wrote:
> See also this J
On Wed, Jun 24, 2020 at 9:48 PM Micah Kornfield wrote:
>
> In that case I would propose the following:
> 1. Standardize on clang for performance generating numbers for performance
> related PRs
> 2. Adjust our binary artifact builds to use clang where feasible (I think
> should wait until after
I've updated the PR. More feedback welcome, I'd like to start a vote by
end-of-week if possible.
On Wed, Jun 24, 2020 at 12:48 PM Micah Kornfield
wrote:
> I agree flight might need to encode this data slightly differently for
> negotiation purposes. I will update the enum to use power of 2 val
In that case I would propose the following:
1. Standardize on clang for performance generating numbers for performance
related PRs
2. Adjust our binary artifact builds to use clang where feasible (I think
should wait until after our next release).
3. Add to the contributors guide summarizing the
hi folks,
This has come up in some other contexts, but I believe it would be a
good idea to increment the version number in Schema.fbs starting with
1.0.0 to separate the pre-1.0 and post-1.0 worlds
https://github.com/apache/arrow/blob/master/format/Schema.fbs#L22
Given that we are contemplating
I drafted the specification changes that would be associated with the
union changes
https://github.com/apache/arrow/pull/7535
I'll start a separate discussion about incrementing the
MetadataVersion since that must be discussed independently.
Please take a look
On Wed, Jun 24, 2020 at 3:50 PM We
hi folks,
(cross-posting to dev@arrow and dev@parquet since there are
stakeholders in both places)
It seems there are still problems at least with the C++ implementation
of LZ4 compression in Parquet files
https://issues.apache.org/jira/browse/PARQUET-1241
https://issues.apache.org/jira/browse/P
The call clashed with the Spark AI Summit keynote as well, so that may have
been a contributing factor.
On Wed, Jun 24, 2020, 10:11 AM Neal Richardson
wrote:
> Attendees:
> Prudhvi Porandla
> Neal Richardson
>
> Discussion:
> * Everyone must be so focused on getting things done for the 1.0 relea
I should also add that we could (with some effort) use the
MetadataVersion V4/V5 indicator to offer backward compatibility for
old serialized union data
In any case, if there is consensus about this, we would need to have a
vote and get busy with implementing and testing the changes. I could
assis
I agree flight might need to encode this data slightly differently for
negotiation purposes. I will update the enum to use power of 2 values so
this isn't precluded, but I think for parsing in the schema, it is clearer
to model this as a list of enums.
Any other thoughts?
Thanks,
Micah
On Tue
On Wed, Jun 24, 2020 at 1:07 PM Francois Saint-Jacques
wrote:
>
> OTOH,
>
> how do we handle NullType -> UnionType cast conversion? Do we
> require some convention like the first children ArrayData null bitmap
> to be set and all tags set to 0?
Sure, that sounds like a reasonable implementation s
OTOH,
how do we handle NullType -> UnionType cast conversion? Do we
require some convention like the first children ArrayData null bitmap
to be set and all tags set to 0?
François
On Wed, Jun 24, 2020 at 1:09 PM Antoine Pitrou wrote:
>
>
> Le 24/06/2020 à 18:34, Wes McKinney a écrit :
> > On We
+1 (binding)
+1 (binding)
Le 23/06/2020 à 20:35, Wes McKinney a écrit :
> Hi,
>
> As discussed on the mailing list [1] I would like to add a "bit width"
> field to our Decimal metadata to allow for supporting different
> Decimal physical sizes other than 128-bit (where 32- and 64-bit
> representations are re
Le 24/06/2020 à 18:34, Wes McKinney a écrit :
> On Wed, Jun 24, 2020 at 11:08 AM Antoine Pitrou wrote:
>>
>>
>> Le 24/06/2020 à 16:57, Wes McKinney a écrit :
>>> hi folks,
>>>
>>> As discussed on the recent GitHub PR [1], as a means of reconciling
>>> the long-standing cross-implementation incom
On Wed, Jun 24, 2020 at 11:08 AM Antoine Pitrou wrote:
>
>
> Le 24/06/2020 à 16:57, Wes McKinney a écrit :
> > hi folks,
> >
> > As discussed on the recent GitHub PR [1], as a means of reconciling
> > the long-standing cross-implementation incompatibilities with Union
> > types, it's been proposed
Attendees:
Prudhvi Porandla
Neal Richardson
Discussion:
* Everyone must be so focused on getting things done for the 1.0 release
that they didn't have time to join the call :shrug:
On Wed, Jun 24, 2020 at 9:01 AM Neal Richardson
wrote:
> Hi all,
> Last minute reminder that our biweekly call is
Le 24/06/2020 à 16:57, Wes McKinney a écrit :
> hi folks,
>
> As discussed on the recent GitHub PR [1], as a means of reconciling
> the long-standing cross-implementation incompatibilities with Union
> types, it's been proposed to remove the top-level validity bitmap from
> the Union data layout
Hi all,
Last minute reminder that our biweekly call is starting now at
https://meet.google.com/vtm-teks-phx. All are welcome to join. Notes will
be sent out to the mailing list afterward.
Neal
+1 (binding)
On Wed, Jun 24, 2020 at 2:03 AM Micah Kornfield wrote:
>
> +1 (binding)
>
> On Tue, Jun 23, 2020 at 11:35 AM Wes McKinney wrote:
>
> > Hi,
> >
> > As discussed on the mailing list [1] I would like to add a "bit width"
> > field to our Decimal metadata to allow for supporting differe
Hi Suvayu, thanks for sharing your experiences. Clearly we have work to do.
Wrt to specific name changes, I agree with Wes. If something is negative to
a non-trivial portion of the population, why not use something that avoids
that issue where possible.
On Fri, Jun 19, 2020, 7:44 PM Suvayu Ali
Per my comments on the pr, I also think this is preferred. I believe we
will avoid the potential for validity inconsistency and simplify
construction of union data in most cases.
On Wed, Jun 24, 2020, 7:58 AM Wes McKinney wrote:
> hi folks,
>
> As discussed on the recent GitHub PR [1], as a mean
See also this JIRA regarding adding random access read APIs for IPC
files (and thus Feather)
https://issues.apache.org/jira/browse/ARROW-8250
I hope to see this implemented someday.
On Wed, Jun 24, 2020 at 10:03 AM Francois Saint-Jacques
wrote:
>
> I forgot to mention that you can see how this
I forgot to mention that you can see how this is glued in
`feather::reader::Read` [1]. This makes it obvious that nothing is
cached and everything is loaded in memory.
François
[1]
https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/feather.cc#L715-L723
On Wed, Jun 24, 2020 at 10:53 A
hi folks,
As discussed on the recent GitHub PR [1], as a means of reconciling
the long-standing cross-implementation incompatibilities with Union
types, it's been proposed to remove the top-level validity bitmap from
the Union data layout and let validity be determined exclusively by
the child arr
Hello Yue,
FeatherV2 is just a facade for the Arrow IPC file format. You can find
the implementation here [1]. I will try to answer your question with
inline comments. On a high level, the file format writes a schema and
then multiple "chunks" called RecordBatch. Your lowest level of
granularity
Arrow Build Report for Job nightly-2020-06-24-0
All tasks:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-24-0
Failed Tasks:
- centos-7-aarch64:
URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-24-0-travis-centos-7-aarch64
- centos-8-am
+1 (binding)
On Tue, Jun 23, 2020 at 11:35 AM Wes McKinney wrote:
> Hi,
>
> As discussed on the mailing list [1] I would like to add a "bit width"
> field to our Decimal metadata to allow for supporting different
> Decimal physical sizes other than 128-bit (where 32- and 64-bit
> representations
28 matches
Mail list logo