RE: RunEndEncodedArray Null Counts

2023-01-22 Thread Tobias Zagorni
Hi Raphael, I think this is indeed a documentation mistake, it should say 0! For exeactly these reasons you mentioned I determined that it is best to leave the null count field always 0 for RLE arrays. This way it is consistent with union types, at least. RunLengthEncoded data should not contain

Re: [VOTE] Release Apache Arrow 11.0.0 - RC0

2023-01-22 Thread Dewey Dunnington
Just a note that I wasn't able to produce an error building Arrow C++ using clang-dev [1]. That isn't to say one doesn't still exist (I will test more thoroughly in the coming days), but it does suggest that it's something we can/should handle at the packaging stage if it does pop up (rather than b

RunEndEncodedArray Null Counts

2023-01-22 Thread Raphael Taylor-Davies
Hi, Apologies if I am rehashing something that has already been discussed or is documented elsewhere, but reading the documentation of the Run-Length encoding [1] I noticed that the parent null count can be non-zero [2]. This is somewhat surprising to me for a couple of reasons: - This is in

Re: Proposal: renaming the 'master' branch to 'main'

2023-01-22 Thread Andy Grove
The default branch in https://github.com/apache/arrow-datafusion-python is now main. The process was simple for this repo - file an issue with INFRA, create a PR to replace master with main where appropriate (docs and workflows). I plan on doing the same for Ballista and DataFusion over the next

Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 16.1.0 RC1

2023-01-22 Thread Andrew Lamb
+1 (binding) Verified on x86 mac I have also filed https://github.com/apache/arrow-datafusion/issues/5023 to discuss nightly releases Thanks Andy. Andrew On Sat, Jan 21, 2023 at 1:56 PM L. C. Hsieh wrote: > +1 (binding) > > Verified on M1 Mac. > > Thanks Andy. > > On Sat, Jan 21, 2023 at

Re: New Pandas-Apache repo

2023-01-22 Thread Adesola Adedewe
Yes I will, I haven't taken enough time to clean up the README , it was generated based on my source code with CHATGPT. I will do that later in the week. On Sun, Jan 22, 2023 at 2:36 AM Benson Muite wrote: > On 1/22/23 13:15, Adesola Adedewe wrote: > > i'm working on a project where big financia

Re: New Pandas-Apache repo

2023-01-22 Thread Benson Muite
On 1/22/23 13:15, Adesola Adedewe wrote: > i'm working on a project where big financial data needs to be loaded stored > and manipulated. the data is stored as parquet. my initial version had > arrow just load the parquet data and i used the basic unorderedmap but this > limited me to only one data

Re: New Pandas-Apache repo

2023-01-22 Thread Adesola Adedewe
i'm working on a project where big financial data needs to be loaded stored and manipulated. the data is stored as parquet. my initial version had arrow just load the parquet data and i used the basic unorderedmap but this limited me to only one data type. i found i could make my database more gene

Re: New Pandas-Apache repo

2023-01-22 Thread Benson Muite
On 1/22/23 11:41, Adesola Adedewe wrote: > The project was initially meant to provide a simpler interface over arrow > apache so pretty much what was done with the python api, but it has > evolved to be more than that ,with indexing and other panda operations > implemented like reindex, resample, c

Re: New Pandas-Apache repo

2023-01-22 Thread Adesola Adedewe
The project was initially meant to provide a simpler interface over arrow apache so pretty much what was done with the python api, but it has evolved to be more than that ,with indexing and other panda operations implemented like reindex, resample, concat etc. I currently have it good enough for my

Re: New Pandas-Apache repo

2023-01-22 Thread Benson Muite
On 1/22/23 06:23, Adesola Adedewe wrote: > okay thanks for your consideration. > > On Sat, Jan 21, 2023 at 4:49 PM Sutou Kouhei wrote: > >> Hi, >> >> I'm not sure pandas like API is suitable for our official >> data frame API. >> >> FYI: >> >> * GitHub issue of this: >> https://github.com/ap