Please note this message and the previous one from the author violate our Code
of Conduct [1]. Specifically "Do not insult or put down other
participants." Please try to be professional in communications and focus
on the technical issues at hand.
[1] https://www.apache.org/foundation/policies/co
There has been a substantial amount of effort put into the arrow-rs Rust
Parquet implementation to handle the corner cases of nested structs and
list, and all the fun of various levels of nullability.
Do let us know if you happen to try writing nested structures directly to
parquet and have issues
Far be it from me to think that I know more than Jorge or Wes on this
subject. Sorry if my post gives that perception, that is clearly not my
intention. I'm just trying to defend the idea that when designing this kind
of transformation, it might be interesting to have a library to test
several mapp
He was trying to nicely say he knows way more than you, and your ideas will
result in a low performance scheme no one will use in production ai/machine
learning.
Sent from my iPhone
> On Jul 28, 2022, at 12:14 PM, Benjamin Blodgett
> wrote:
>
> I think Jorge’s opinion has is that of an expe
I think Jorge’s opinion has is that of an expert and him being humble is just
being tactful. Probably listen to Jorge on performance and architecture, even
over Wes as he’s contributed more than anyone else and know the bleeding edge
of low level performance stuff more than anyone.
Sent from
Hi Jorge
I don't think that the level of in-depth knowledge needed is the same
between using a row-oriented internal representation and "Arrow" which not
only changes the organization of the data but also introduces a set of
additional mapping choices and concepts.
For example, assuming that the
Hi Laurent,
I agree that there is a common pattern in converting row-based formats to
Arrow.
Imho the difficult part is not to map the storage format to Arrow
specifically - it is to map the storage format to any in-memory (row- or
columnar- based) format, since it requires in-depth knowledge abo
Let me clarify the proposal a bit before replying to the various previous
feedbacks.
It seems to me that the process of converting a row-oriented data source
(row = set of fields or something more hierarchical) into an Arrow record
repeatedly raises the same challenges. A developer who must perf
We had an e-mail thread about this in 2018
https://lists.apache.org/thread/35pn7s8yzxozqmgx53ympxg63vjvggvm
I still think having a canonical in-memory row format (and libraries
to transform to and from Arrow columnar format) is a good idea — but
there is the risk of ending up in the tar pit of re
Are there more details on what exactly an "Arrow Intermediate
Representation (AIR)" is? We've talked about in the past maybe having a
memory layout specification for row-based data as well as column based
data. There was also a recent attempt at least in C++ to try to build
utilities to do these
I think this has been addressed for both Parquet and Python to handle records
including nested structures. Not sure about Rust and Go..
[C++][Parquet] Read and write nested Parquet data with a mix of struct and list
nesting levels
https://issues.apache.org/jira/browse/ARROW-1644
[Python] Add
11 matches
Mail list logo