Hi,
So, something like a human and computer readable standard for arrow
schemas, e.g. via yaml or a json schema.
We kind of do this in our integration tests / golden tests, where we have
a non-official json representation of an arrow schema.
The ask here is to standardize such a format in some
Hi
This is c++ specific, but imo the question applies more broadly.
I understood that the rationale for stats in compressed+encoded formats
like parquet is that computing those stats has a high cost (io + decompress
+ decode + aggregate). This motivates the materialization of aggregates.
In arro
+1 - great work!!!
On Fri, Mar 1, 2024 at 5:49 PM Micah Kornfield
wrote:
> +1 (binding)
>
> On Friday, March 1, 2024, Uwe L. Korn wrote:
>
> > +1 (binding)
> >
> > On Fri, Mar 1, 2024, at 2:37 PM, Andy Grove wrote:
> > > +1 (binding)
> > >
> > > On Fri, Mar 1, 2024 at 6:20 AM Weston Pace
> > w
+1
On Sun, 28 Jan 2024, 00:00 Wes McKinney, wrote:
> +1 (binding)
>
> On Sat, Jan 27, 2024 at 12:26 PM Micah Kornfield
> wrote:
>
> > +1 Binding
> >
> > On Sat, Jan 27, 2024 at 10:21 AM David Li wrote:
> >
> > > +1 (binding)
> > >
> > > On Sat, Jan 27, 2024, at 13:03, L. C. Hsieh wrote:
> > >
+1
Thanks a lot for all this. Really exciting!!
On Mon, 19 Dec 2022, 17:56 Matt Topol, wrote:
> That leaves us with a total vote of +1.5 so the vote carries with the
> caveat of changing the name to be Run End Encoded rather than Run Length
> Encoded (unless this means I need to do a new vote w
Hi,
AFAIK compressed IPC arrow files do not support random access (like
uncompressed counterparts) - you need to decompress the whole batch (or at
least the columns you need). A "RecordBatch" is the compression unit of the
file. Think of it like a parquet file whose every row group has a single
da
I agree.
I suspect that the most widely used API with "feather" is Pandas'
read_feather.
On Mon, 29 Aug 2022, 19:55 Weston Pace, wrote:
> I agree as well. I think most lingering uses of the term "feather"
> are in pyarrow and R however, so it might be good to hear from some of
> those mainta
+1
Really well written, thanks for driving this!
On Mon, 29 Aug 2022, 11:16 Antoine Pitrou, wrote:
>
> Hello,
>
> Just a heads up that more PMC votes are needed here.
>
>
>
> Le 24/08/2022 à 17:24, Antoine Pitrou a écrit :
> >
> > Hello,
> >
> > I would like to propose we vote for the following
made UB-safe by using the memcpy trick, which is correctly
> optimized by production compilers:
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/ubsan.h#L55-L69
>
> Regards
>
> Antoine.
>
>
> Le 01/08/2022 à 18:55, Jorge Cardoso Leitão a écrit :
> &
I am +1 on either - imo:
* it is important to have either available
* both provide a non-trivial improvement over what we have
* the trade-off is difficult to decide upon - I trust whomever is
implementing it to experiment and decide which better fits Arrow and the
ecosystem.
Thank you so much fo
Hi,
I am trying to follow the C++ implementation with respect to mmap IPC files
and reading them zero-copy, in the context of reproducing it in Rust.
My understanding from reading the source code is that we essentially:
* identify the memory regions (offset and length) of each of the buffers,
via
Hi Laurent,
I agree that there is a common pattern in converting row-based formats to
Arrow.
Imho the difficult part is not to map the storage format to Arrow
specifically - it is to map the storage format to any in-memory (row- or
columnar- based) format, since it requires in-depth knowledge abo
Sorry, I got a bit confused on what we were voting on. Thank you for the
clarification.
+1
Best,
Jorge
On Wed, Jun 8, 2022 at 9:53 PM Antoine Pitrou wrote:
>
> Le 08/06/2022 à 20:55, Jorge Cardoso Leitão a écrit :
> > 0 (binding) - imo there is some unclarity over what is ex
0 (binding) - imo there is some unclarity over what is expected to be
passed over the C streaming interface - an Array or a StructArray.
I think the spec claims the former, but the C++ implementation (which I
assume is the reference here) expects the latter [1].
Would it be possible to clarify th
Congratulations, great work!
On Sat, Apr 30, 2022 at 3:30 AM L. C. Hsieh wrote:
> Thanks all!
>
> On Fri, Apr 29, 2022 at 7:19 PM Yijie Shen
> wrote:
> >
> > Congrats Liang-Chi!
> >
> >
> > On Thu, Apr 28, 2022 at 8:36 PM Vibhatha Abeykoon
> > wrote:
> >
> > > Congratulations!
> > >
> > > On T
ore
>
> This doesn't really answer my question, does it?
>
>
> >
> > On Tue, Apr 26, 2022 at 10:22 AM Antoine Pitrou
> wrote:
> >
> >>
> >> Le 26/04/2022 à 16:18, Jorge Cardoso Leitão a écrit :
> >>>> Would WASM be able to inte
> Would WASM be able to interact in-process with non-WASM buffers safely?
AFAIK yes. My understanding from playing with it in JS is that a
WASM-backed udf execution would be something like:
1. compile the C++/Rust/etc UDF to WASM (a binary format)
2. provide a small WASM-compiled middleware of th
ed to worry about possibly corrupting
> the data. The challenging part is determining the exact locations that
> need to be overwritten.
>
> -MIcah
>
> On Mon, Apr 4, 2022 at 7:40 AM Jorge Cardoso Leitão <
> jorgecarlei...@gmail.com> wrote:
>
> > Hi,
> >
&g
Hi,
Motivated by [1], I wonder if it is possible to write to IPC without
writing the data to an intermediary buffer.
The challenge is that the header of an IPC message [header][data] requires:
* the positions of the buffers
* the total length of the body
For uncompressed data, we could compute
Congrats to all of you - well deserved!
On Mon, Mar 14, 2022, 20:47 Bryan Cutler wrote:
> Congrats to all!
>
> On Thu, Mar 10, 2022 at 12:11 AM Alenka Frim
> wrote:
>
> > Congratulations all!
> >
> > On Thu, Mar 10, 2022 at 1:55 AM Yang hao <1371656737...@gmail.com>
> wrote:
> >
> > > Congratul
n the two reference Arrow implementations (C++ and
> > >> Java). However, our implementation landscape is now much richer than
> it
> > >> used to be (for example, there is a tremendous activity on the Rust
> > >> side). Do we want to keep the historical &q
+1 adding 32 and 64 bit decimals.
+0 to release it without integration tests - both IPC and the C data
interface use a variable bit width to declare the appropriate size for
decimal types. Relaxing from {128,256} to {32,64,128,256} seems a low risk
from an integration perspective, as implementatio
A change in the length of an array is equivalent to a change in at least
one of its buffers (i.e. length is always physical).
* Primitive arrays (i32, i64, etc): the arrays' length is equal to the
length of the buffer divided by the size of the type. E.g. buffer.len() = 8
and i32 <=> length = 2)
*
Isn't field-0 representing ["joe", None, None, "mark"]? validity is
"1001" and offsets [0,3,3,7]. My reading is that the values buffer is
"joemark" because we do not represent values in null slots.
Best,
Jorge
On Fri, Feb 18, 2022 at 7:07 PM Phillip Cloud wrote:
> My read of the spec for s
Hi Dominik,
That is my understanding - if it exists, the length of the validity must
equal the length of each field. Otherwise, it would be difficult to iterate
over the fields and validity together, since we would not have enough rows
in the fields for the validity.
I think that this is broader
Hi,
Great questions and write up. Thanks!
imo dragging a JSON reader and writer to read official extension types'
metadata seems overkill. The c data interface is expected to be quite low
level. Imo we should aim for a (non-human readable) binary format. For
non-official, imo you are spot on - us
Note that we do not have tests on tensor arrays, so testing the extension
type on these may be hindered by divergences between implementations. I do
not think we even have json integration files for them.
If the focus is extension types, maybe it would be best to cover types
whose physical represe
Thank you so much for all your contributions to open source and to Apache
Arrow in particular, and for accepting taking this role.
On Tue, Jan 25, 2022 at 7:10 PM QP Hou wrote:
> Congrats Kou, very well deserved.
>
> On Tue, Jan 25, 2022 at 9:53 AM Benson Muite
> wrote:
> >
> > Congratulations
de whether a pointer
> > is correct and doesn't reveal unrelated data?).
> >
> > I think we should discuss this with the DuckDB folks (and possibly the
> > Velox folks, but I guess that it might socio-politically more difficult)
> > so as to measure how much of an i
Hi,
Thank you for raising this here and for your comments. I am very humbled by
the feedback and adoption that arrow2 got so far.
My current hypothesis is that arrow2 will be donated to Apache Arrow, I
just don't feel comfortable and have the energy doing so right now.
Thank you for your underst
+1
On Tue, Jan 11, 2022 at 9:17 PM QP Hou wrote:
> +1 (non-binding)
>
> On Mon, Jan 10, 2022 at 3:14 PM Andy Grove wrote:
> >
> > +1 (binding)
> >
> > Thanks,
> >
> > Andy.
> >
> > On Sat, Jan 8, 2022 at 3:43 AM Andrew Lamb wrote:
> >
> > > Hi,
> > >
> > > I would like to propose a release of
Fair enough (wrt to deprecation). Think that the sequence view is a
replacement for our existing (that allows O(N) selections), but I agree
with the sentiment that preserving compatibility is more important than a
single way of doing it. Thanks for that angle!
Imo the Arrow format is already compo
Hi,
The accumulator API is designed to accept multiple columns (e.g. the
pearson correlation takes 2 columns, not one). &values[0] corresponds to
the first column passed to the accumulator. All concrete implementations of
accumulators in DataFusion atm only accept one column (Sum, Avg, Count,
Min,
+1
Thanks,
Jorge
On Fri, Dec 24, 2021 at 3:21 AM Wang Xudong wrote:
> +1 (non-binding)
>
> Happy holidays
>
> ---
> xudong
>
> Andy Grove 于2021年12月24日周五 09:19写道:
>
> > +1 (binding)
> >
> > Thanks,
> >
> > Andy.
> >
> > On Thu, Dec 23, 2021 at 2:26 PM Andrew Lamb
> wrote:
> >
> > > Hi,
> > >
Congratulations!!
On Tue, Dec 21, 2021 at 5:24 PM Andrew Lamb wrote:
> Congratulations Daniël ! Well deserved
>
> On Tue, Dec 21, 2021 at 12:18 PM Wes McKinney wrote:
>
> > The Project Management Committee (PMC) for Apache Arrow has invited
> > Daniël Heres to become a PMC member and we are ple
Hi,
Thanks a lot for this initiative and the write up.
I did a small bench for the sequence view and added a graph to the document
for evidence of what Wes is writing wrt to performance of "selection / take
/ filter".
Big +1 in replacing our current representation of variable-sized arrays by
the
+1
Thank you!
On Fri, Dec 10, 2021 at 9:50 PM Andy Grove wrote:
> +1 (binding)
>
> Thank you, Andrew, and everyone else involved in the input validation work.
> This definitely helps address one of the biggest criticisms of the crate.
>
> Andy.
>
> On Fri, Dec 10, 2021 at 12:30 PM Andrew Lamb
Congrats!
On Wed, Dec 8, 2021 at 8:14 AM Daniël Heres wrote:
> Congrats Rémi!
>
> On Wed, Dec 8, 2021, 04:27 Ian Joiner wrote:
>
> > Congrats!
> >
> > On Tuesday, December 7, 2021, Wes McKinney wrote:
> >
> > > On behalf of the Arrow PMC, I'm happy to announce that Rémi Dattai has
> > > accept
Congratulations!
On Thu, Nov 18, 2021 at 3:34 AM Ian Joiner wrote:
> Congrats Joris and really thanks for your effort in integrating ORC and
> dataset!
>
> Ian
>
> > On Nov 17, 2021, at 5:55 PM, Wes McKinney wrote:
> >
> > The Project Management Committee (PMC) for Apache Arrow has invited
> >
What are the tradeoffs between a low and large and row group size?
Is it that a low value allows for quicker random access (as we can seek row
groups based on the number of rows they have), while a larger value allows
for higher dict-encoding and compression ratios?
Best,
Jorge
On Wed, Nov 17
to eat into any
> gains. There is also non-zero engineering effort to implement the necessary
> filter/selection push down APIs that most of them provide. That being
> said, I'd love to see real world ETL pipeline benchmarks :)
>
>
> On Tue, Nov 2, 2021 at 4:39 AM Jorge Cardoso
I think the c data interface requires the arrays to be immutable or two
implementations will race when mutating/reading the shared regions, since
we have no mechanism to synchronize read/write access across the boundary.
Best,
Jorge
On Wed, Nov 3, 2021 at 1:50 PM Alessandro Molina <
alessan...@u
> Just an idea: Do the Avro libs support different allocators? Maybe
> using
> > a
> > > different one (e.g. mimalloc) would yield more similar results by
> working
> > > around the fragmentation you described.
> > >
> > > This wouldn't change
Hi,
I am reporting back a conclusion that I recently arrived at when adding
support for reading Avro to Arrow.
Avro is a storage format that does not have an associated in-memory format.
In Rust, the official implementation deserializes an enum, in Python to a
vector of Object, and I suspect in J
ARROW-14453
[2] https://github.com/pola-rs/polars
[3] https://h2oai.github.io/db-benchmark/
On Wed, Oct 27, 2021 at 7:57 PM Antoine Pitrou wrote:
>
> Le 26/10/2021 à 21:30, Jorge Cardoso Leitão a écrit :
> > Hi,
> >
> > One aspect of the design of "arrow2" is that it d
Hi,
One aspect of the design of "arrow2" is that it deals with array slices
differently from the rest of the implementations. Essentially, the offset
is not stored in ArrayData, but on each individual Buffer. Some important
consequence are:
* people can work with buffers and bitmaps without havin
Congratulations!!! :)
On Thu, Oct 7, 2021, 21:58 Weston Pace wrote:
> Congratulations Jiayu Liu!
>
> On Thu, Oct 7, 2021 at 8:02 AM Yijie Shen
> wrote:
> >
> > Congratulations Jianyu
> >
> >
> > Micah Kornfield 于2021年10月8日 周五上午12:29写道:
> >
> > > A little late, but welcome and thank you for your
ench.com/q/NxyDu89azmKJmiVxF29Ei8FybWk
>
>
> On 9/7/21 7:16 PM, Jorge Cardoso Leitão wrote:
> > Thanks,
> >
> > I think that the alignment requirement in IPC is different from this one:
> > we enforce 8/64 byte alignment when serializing for IPC, but we (only)
&
Congrats!! =)
On Thu, Sep 9, 2021, 20:12 Micah Kornfield wrote:
> Congrats!
>
> On Thursday, September 9, 2021, Weston Pace wrote:
>
> > Congratulations Nic!
> >
> > On Thu, Sep 9, 2021 at 7:43 AM Antoine Pitrou
> wrote:
> > >
> > >
> > > Welcome on board Nic!
> > >
> > >
> > > On Thu, 9 Sep 2
ld be
> > for operations on wider types (Decimal128 and Decimal256). Another
> place
> > where I think alignment could help is when adding two primitive arrays
> (it
> > sounds like this was summing a single array?).
> >
> > [1]
> >
> https://lists.apache.o
Thanks a lot Antoine for the pointers. Much appreciated!
Generally, it should not hurt to align allocations to 64 bytes anyway,
> since you are generally dealing with large enough data that the
> (small) memory overhead doesn't matter.
>
Not for performance. However, 64 byte alignment in Rust req
Hi,
We have a whole section related to byte alignment (
https://arrow.apache.org/docs/format/Columnar.html#buffer-alignment-and-padding)
recommending 64 byte alignment and referring to intel's manual.
Do we have evidence that this alignment helps (besides intel claims)?
I am asking because going
:
> > >
> > > Hi Jorge,
> > > Are there places in the docs that you think this would simplify?
> > > There is an old JIRA [1] about introducing a c-struct type that I
> > > think aligns with this observation [1]
> > >
> > >
note that that would be an upper bound because buffers can be shared
between arrays.
On Wed, Sep 1, 2021 at 2:15 PM Antoine Pitrou wrote:
> On Tue, 31 Aug 2021 21:46:23 -0700
> Rares Vernica wrote:
> >
> > I'm storing RecordBatch objects in a local cache to improve performance.
> I
> > want to
Hi,
Just came across this curiosity that IMO may help us to design physical
types in the future.
Not sure if this was mentioned before, but it seems to me that
`DaysMilliseconds` and `MonthDayNano` belong to a broader class of physical
types "typed tuples" in that they are constructed by defining
for the avoidance of doubt, +1 on the vote: release :)
On Sat, Aug 28, 2021 at 12:12 PM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:
> +1
>
> Thanks, Andrew!
>
> On Sat, Aug 28, 2021 at 12:10 PM Andrew Lamb wrote:
>
>> Update here: the issue is tha
+1
Thanks, Andrew!
On Sat, Aug 28, 2021 at 12:10 PM Andrew Lamb wrote:
> Update here: the issue is that we made a chance that is not compatible with
> older rust toolchains.
>
> The consensus on the PR seems to be that since arrow-rs doesn't offer any
> explicit compatibility for older rust too
+1
On Fri, Aug 20, 2021 at 2:43 PM David Li wrote:
> +1
>
> On Thu, Aug 19, 2021, at 18:33, Weston Pace wrote:
> > +1
> >
> > On Thu, Aug 19, 2021 at 9:18 AM Wes McKinney
> wrote:
> > >
> > > +1
> > >
> > > On Thu, Aug 19, 2021 at 6:20 PM Antoine Pitrou
> wrote:
> > > >
> > > >
> > > > Hello,
+1
On Tue, Aug 17, 2021 at 8:50 PM Micah Kornfield
wrote:
> Hello,
> As discussed previously [1], I'd like to call a vote to add a new interval
> type which is a triple of Month, Days, and nanoseconds. The formal
> definition is defined in a PR [2] along with Java and C++ implementations
> that
+1
Great work everyone!
On Fri, Aug 13, 2021, 22:19 Daniël Heres wrote:
> +1 (non binding). Looking good.
>
>
> On Fri, Aug 13, 2021, 07:49 QP Hou wrote:
>
> > Good call Ruihang. I remember we used to have this toolchain file when
> > we were still in the main arrow repo. I will take a look i
%3E
>
> -Micah
>
> On Fri, Aug 13, 2021 at 10:57 AM Keith Kraus
> wrote:
>
> > How would using the typeid directly work with arbitrary Extension types?
> >
> > -Keith
> >
> > On Fri, Aug 13, 2021 at 12:49 PM Jorge Cardoso Leitão <
> > jorgeca
Hi,
In the UnionArray, there is a level of indirection between types (buffer of
i8s) -> typeId (i8) -> field. For example, the generated_union part of our
integration tests has the data:
types: [5, 5, 5, 5, 7, 7, 7, 7, 5, 5, 7] (len = 11)
typeids: [5, 7]
fields: [int32, utf8]
My understanding is
ions" in spawn_blocking or something equivalent.
Best,
Jorge
On Thu, Aug 12, 2021 at 10:03 PM Phillip Cloud wrote:
> On Thu, Aug 12, 2021 at 1:03 PM Jorge Cardoso Leitão <
> jorgecarlei...@gmail.com> wrote:
>
> > I agree with Antoine that we should weigh the pros and cons
Hi,
The checkout of arrow-rs on the failed build is over fa5acd971c97, which up
to 3hrs or so was master, so, I think it is picking the right code.
Did a quick investigation:
* The integration tests on arrow-rs have not been running since June the
30th. they stopped running after the merge of [1
I agree with Antoine that we should weigh the pros and cons of flatbuffers
(or protobuf or thrift for that matter) over a more human-friendly,
simpler, format like json or MsgPack. I also struggle a bit to reason with
the complexity of using flatbuffers for this.
E.g. there is no async support for
Couple of questions
1. Is the goal that IRs have equal semantics, i.e. given (IR,data), the
operation "(IR,data) - engine -> result" MUST be the same for all "engine"?
2. if yes, imo we may need to worry about:
* a definition of equality that implementations agree on.
* agreement over what the sem
; Thanks,
> > QP
> >
> >
> >
> > On Tue, Aug 3, 2021 at 5:31 PM paddy horan
> wrote:
> > >
> > > Hi Jorge,
> > >
> > > I see value in consolidating development in a single repo and releasing
> > under the existing arrow crate. Re
ting Arrow
> community that Arrow2 is the future but that it is <1.0
> - existing users will be well supported in this transition
> - In general, I think the longer that development proceeds in separate
> repos the harder it will be to eventually merge the two in a way that
>
Hi,
Sorry for the delay.
If there is a path towards an official release under a <1.0.0 versioning
schema aligned with the rest of the Rust ecosystem and in line with the
stability of the API, then IMO we should move all development to within
Apache experimental asap (I can handle this and the lik
Congratulations, Neville :)
On Fri, Jul 30, 2021 at 8:18 AM QP Hou wrote:
> Well deserved, congratulations Neville!
>
> On Thu, Jul 29, 2021 at 3:20 PM Wes McKinney wrote:
> >
> > The Project Management Committee (PMC) for Apache Arrow has invited
> > Neville Dipale to become a PMC member and w
Congratulations and thank you for all the great work! It is a pleasure to
work with you.
Best,
Jorge
On Mon, Jul 26, 2021 at 7:38 PM Niranda Perera
wrote:
> Congrats QP! :-)
>
> On Mon, Jul 26, 2021 at 1:24 PM Micah Kornfield
> wrote:
>
> > Congrats QP!
> >
> > On Mon, Jul 26, 2021 at 10:02 A
Hi,
I would like to gauge your interest in a release of the Python bindings for
DataFusion.
There has been a tremendous amount of updates to it, including support for
Python 3.9.
This release is backward compatible and there are no blockers.
This would be the first time a release of this is cut
> and arrow 6.0.0 (or other future versions) is perfectly compatible with
> semantic versioning and other software projects.
>
> Andrew
>
> On Mon, Jul 19, 2021 at 2:08 AM Jorge Cardoso Leitão <
> jorgecarlei...@gmail.com> wrote:
>
> > Hi,
> >
> > W
; > I think this approach wouldn't result in extra work (backporting the
> > important changes to 5.1,5.2 release). It only shows the magnitude of
> this
> > change, the work would be done by you anyways, this would just make it
> > clear this is a huge effort.
>
Hi,
Arrow2 and parquet2 have passed the IP clearance vote and are ready to be
merged to apache/* repos.
My plan is to merge them and PR to both of them to the latest updates on my
own repo, so that I can temporarily (and hopefully permanently) archive the
versions of my account and move developme
o gene...@incubator.apache.org like
>
>
> https://lists.apache.org/thread.html/r319b85f0f24f9b0529865387ccfe1b2a00a16f394a48144ba25c3225%40%3Cgeneral.incubator.apache.org%3E
>
> On Sat, Jul 10, 2021 at 7:48 AM Jorge Cardoso Leitão
>
> wrote:
> >
> > Thanks a lot Wes,
, Jul 5, 2021 at 10:38 AM Wes McKinney wrote:
> Great, thanks for the update and pushing this forward. Let us know if
> you need help with anything.
>
> On Sun, Jul 4, 2021 at 8:26 PM Jorge Cardoso Leitão
> wrote:
> >
> > Hi,
> >
> > Wes and Neils,
> >
Hi,
AFAIK timezone is part of the spec. In Python, that would be [1]
import pyarrow as pa
dt1 = pa.timestamp("ms", "+00:10")
dt2 = pa.timestamp("ms")
arrow-rs is not very consistent with how it handles it. imo that is an
artifact of being currently difficult (API wise) to create an array with a
al-rs-parquet2/pull/1
On Mon, Jun 7, 2021 at 11:55 PM Wes McKinney wrote:
> On Sun, Jun 6, 2021 at 1:47 AM Jorge Cardoso Leitão
> wrote:
> >
> > Hi,
> >
> > Thanks a lot for your feedback. I agree with all the arguments put
> forward,
> > including Andrew
With 10 +1, 3 +1 non-binding, and no 0 nor -1, the vote passed.
Thank you all for your participation and for this clarification. I will
start work with the incubator for the IP clearance.
Best,
Jorge
PR to the benchmark repo that clarifies that
> it's executing the query using the arrow R/C++ library, when in fact
> the query is actually primarily handled by dplyr and not Arrow at all.
> The benchmark is very misleading in its current form.
>
> On Fri, Jun 25, 2021 at 11:55
I just had a quick chat over the ASF's slack with Daniel Gruno from the
infra team and they are rolling out the "triage role" [1] for
non-committers, which AFAIK offers useful tools in this context:
* add/remove labels
* assign reviewees
* mark duplicates
* close, open and assign to issues and PRs
Hi,
I would like to bring to this mailing list a proposal to donate the source
code of arrow2 [1] and parquet2 [2] as experimental repositories [3] within
Apache Arrow, conditional on IP clearance.
The specific PRs are:
* https://github.com/apache/arrow-experimental-rs-arrow2/pull/1
* https://gi
+1
Ran verification script on Apple intel.
On Fri, Jun 25, 2021 at 12:16 AM Andrew Lamb wrote:
> Hi,
>
> I would like to propose a release of Apache Arrow Rust Implementation,
> version 4.4.0.
>
> This release candidate is based on commit:
> 32b835e5bee228d8a52015190596f4c33765849a [1]
>
> The
+1
On Fri, Jun 25, 2021 at 7:47 PM Julian Hyde wrote:
> +1
>
> > On Jun 25, 2021, at 10:36 AM, Antoine Pitrou wrote:
> >
> >
> > Le 24/06/2021 à 21:16, Weston Pace a écrit :
> >> The discussion in [1] led to the following proposal which I would like
> >> to submit for a vote.
> >> ---
> >> Arro
Hi,
HO2 has a set of benchmarks comparing different query engines [1].
There is currently an implementation named "Arrow", backed by the Arrow R
implementation [2].
This is one of the least performant implementations evaluated. I sense that
this may negatively affect the Arrow format, as people
Thank you for sharing, Wes, an interesting paper indeed.
In Rust we currently use a different strategy.
We build an iterator over ranges [a_i, b_i[ to be selected from the
filter bitmap, and filter the array based on those ranges. For a
single filter, the ranges are iterated as they are being bui
The Apache Arrow team is pleased to announce the 4.0.1 release. This
release covers general bug fixes on the different implementations, notably
C++, R, Python and JavaScript.
The list is available [1], with the list of contributors [2] and changelog [3].
As usual, see the install page [4] for inst
Hi,
I have created a new experimental repo [1] to lay foundations for the
parquet2 work within Arrow.
The way I proceeded so far:
1. pushed arrow-rs master to it (from commit 9f56a)
2. removed all arrow-related code and committed
3. removed all (rm -rf *) and committed
4. PRed parquet2, rebased
5:57 AM Jorge Cardoso Leitão
wrote:
>
> Thanks a lot, Krisztian.
>
> The JS packages are still missing. I already have access to npm (thanks
> Sutou). As part of the npm-release.sh in 4.0.1, we require all tests to pass
> [1]. However, there are tests failing on my computer [2]
tion and quick questions but
> not a place to make make decisions.
>
> Thanks,
> Wes
>
> On Fri, Jun 18, 2021 at 12:14 AM Jorge Cardoso Leitão
> wrote:
> >
> > Hi,
> >
> > I agree that the communication improved a lot with moving the issues
> > to Gi
Hi,
(this has no direction; I am just genuinely curious)
I am wondering, what is the rational to use "offsets" instead of
"lengths" to represent variable sized arrays?
I.e. ["a", "", None, "ab"] is represented as
offsets: [0, 1, 1, 1, 3]
values: "aab"
what is the reasoning to use this over
le
Hi,
I agree that the communication improved a lot with moving the issues
to Github and slack, which made the sync call less relevant.
Best,
Jorge
On Thu, Jun 17, 2021 at 11:55 PM Andrew Lamb wrote:
>
> I think dropping back from the Rust sync call and using the regular Arrow
> Sync call should
Thank you everyone for participating so far; really important and
useful discussion.
I think of this discussion as a set of test cases over behavior:
parameterization:
* Timestamp(ms, None)
* Timestamp(ms, "00:00")
* Timestamp(ms, "01:00")
Cases:
* its string representation equals to
* add a dur
ase.sh#L23
[2] https://issues.apache.org/jira/browse/ARROW-13046
Best,
Jorge
On Thu, Jun 10, 2021 at 1:15 PM Krisztián Szűcs
wrote:
> On Thu, Jun 10, 2021 at 6:57 AM Jorge Cardoso Leitão
> wrote:
> >
> > I have been unable to generate the docs from any of my two machines (my
> &
Isn't an array of complexes represented by what arrow already supports? In
particular, I see at least two valid in-memory representations to use, that
depend on what we are going to do with it:
* Struct[re, im]
* FixedList[2]
In the first case, we have two buffers, [x0, x1, ...] and [y0, y1, ...]
Hi,
I agree with all of you. ^_^
I created https://github.com/apache/arrow-datafusion/issues/533 to track
this. I tried to encapsulate the three main use-cases for the SQL
extension. Feel free to edit at will.
Best,
Jorge
On Thu, Jun 10, 2021 at 8:37 AM QP Hou wrote:
> Thanks Daniël for st
4.0.1".
Best,
Jorge
On Sun, Jun 6, 2021 at 6:39 AM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:
> Hi,
>
> Sorry for the delay on this, but it is not being easy to build the docs
> [1-5], which is why this is taking some time. It seems that our CI is
> cac
Hi,
Some questions that come to mind:
1. If we add vendor X to datafusion, will we be open to other vendor Y? How
do we compare vendors? How do we draw the line of "not sufficiently
relevant"?
2. How do we ensure that we do not distort the same level playing field
that some people expect from Dat
Semantically, a NaN is defined according to the IEEE_754 for floating
points, while a null represents any value whose value is undefined,
unknown, etc.
An important set of problems that arrow solves is that it has a native
representation for null values (independent of NaNs): arrow's in-memory
mod
1 - 100 of 235 matches
Mail list logo