Re: [Celebrate] Arrow has reached 2000 stargeezers

2018-05-28 Thread simba nyatsanga
Congratulations everyone! On Mon, 28 May 2018 at 21:42 Li Jin wrote: > Congrats everyone! > On Mon, May 28, 2018 at 3:21 PM Jacques Nadeau wrote: > > > Woo! > > > > On Mon, May 28, 2018 at 4:50 PM, Wes McKinney > wrote: > > > > > Congrats all! The journey continues > > > > > > On Mon, May 28,

Memory mapping error on pq.read_table

2018-02-08 Thread simba nyatsanga
Hi Everyone, I've encountered a memory mapping error when attempting to read a parquet file to a Pandas DataFrame. It seems to be happening intermittently though, I've so far encountered it once. In my case the pq.read_table code is being invoked in a Linux docker container. I had a look at the

Re: [Python] Disk size performance of Snappy vs Brotli vs Blosc

2018-01-30 Thread simba nyatsanga
2018 at 15:37 simba nyatsanga <simnyatsa...@gmail.com> wrote: > Thanks all for the great feedback! > > Thanks Daniel for the sample data sets. I loaded them up and they're quite > comparable in size to some of the data I'm dealing with. In my case the > shapes range from 150

Re: [Python] Disk size performance of Snappy vs Brotli vs Blosc

2018-01-25 Thread simba nyatsanga
rows which can penalize system that > > expect > > > to amortize column meta data over more data. > > > > > > This test might match your situation, but I would be leery of drawing > > > overly broad conclusions from this single data point. > > >

Re: [Python] Disk size performance of Snappy vs Brotli vs Blosc

2018-01-24 Thread simba nyatsanga
me through. Try uploading them somewhere and link > to them in the mails. Attachments are always stripped on Apache > mailing lists. > Uwe > > > On Wed, Jan 24, 2018, at 1:48 PM, simba nyatsanga wrote: > > Hi Everyone, > > > > I did some benchmarking to compare the disk

[Python] Disk size performance of Snappy vs Brotli vs Blosc

2018-01-24 Thread simba nyatsanga
Hi Everyone, I did some benchmarking to compare the disk size performance when writing Pandas DataFrames to parquet files using Snappy and Brotli compression. I then compared these numbers with those of my current file storage solution. In my current (non Arrow+Parquet solution), every column in

Re: Uniform types in Arrow table columns (pyarrow.array) and the case of python dictionaries

2018-01-22 Thread simba nyatsanga
ht Arrow memory layout. > > - Wes > > On Mon, Jan 22, 2018 at 4:50 PM, simba nyatsanga <simnyatsa...@gmail.com> > wrote: > > Hi Uwe, > > > > Thank you very much for the detailed explanation. I have a much better > > understanding now. > > > > C

Re: Uniform types in Arrow table columns (pyarrow.array) and the case of python dictionaries

2018-01-22 Thread simba nyatsanga
Hi Uwe, Thank you very much for the detailed explanation. I have a much better understanding now. Cheers On Mon, 22 Jan 2018 at 19:37 Uwe L. Korn <uw...@xhochy.com> wrote: > Hello Simba, > > find the answers inline. > > On Mon, Jan 22, 2018, at 7:29 AM, simba nyatsanga w

Uniform types in Arrow table columns (pyarrow.array) and the case of python dictionaries

2018-01-21 Thread simba nyatsanga
Hi Everyone, I've got two questions that I'd like help with: 1. Pandas and numpy arrays can handle multiple types in a sequence eg. a float and a string by using the dtype=object. From what I gather, Arrow arrays enforce a uniform type depending on the type of the first encountered element in a

Re: PyArrow python list to numpy nd.array inference in pd.read_table

2018-01-18 Thread simba nyatsanga
- Wes > > On Thu, Jan 18, 2018 at 2:10 PM, simba nyatsanga <simnyatsa...@gmail.com> > wrote: > > Hi Wes, > > > > Great! Thanks for the pointer. From what I gather this is a fundamental > and > > deliberate design decision. Would I be correct in saying the

Re: PyArrow python list to numpy nd.array inference in pd.read_table

2018-01-18 Thread simba nyatsanga
ter/cpp/src/arrow/python/arrow_to_pandas.cc#L541 > > - Wes > > On Thu, Jan 18, 2018 at 1:26 PM, simba nyatsanga <simnyatsa...@gmail.com> > wrote: > > > Good day everyone, > > > > I noticed what looks like type inference happening after persisting a > >

PyArrow python list to numpy nd.array inference in pd.read_table

2018-01-18 Thread simba nyatsanga
Good day everyone, I noticed what looks like type inference happening after persisting a pandas DataFrame where one of the column values is a list. When I load up the DataFrame again and do df.to_dict(), the value is no longer a list but a numpy array. I dug through functions in the

Re: Trying to build to build pyarrow for python 2.7

2018-01-17 Thread simba nyatsanga
epending on how development is progressing. > > - Wes > > On Sun, Jan 14, 2018 at 9:19 AM, simba nyatsanga <simnyatsa...@gmail.com> > wrote: > > Thanks a lot. I see that there's a PR that's been opened to resolve the > > encoding issue - https://github.com/apache/arrow/p

Re: Trying to build to build pyarrow for python 2.7

2018-01-14 Thread simba nyatsanga
Sun, Jan 14, 2018, at 2:42 PM, simba nyatsanga wrote: > > Amazing, thanks Uwe! > > > > I was able to build pyarrow successfully for python 2.7 using your > > workaround. I appreciate that you've got a possible solution for the too. > > > > Besides the PR getting

Re: Trying to build to build pyarrow for python 2.7

2018-01-14 Thread simba nyatsanga
uot;/Users/simba/Projects/personal/oss/arrow/python/build/ > > temp.macosx-10.9-x86_64-2.7/CMakeFiles/CMakeOutput.log". > > See also "/Users/simba/Projects/personal/oss/arrow/python/build/ > > temp.macosx-10.9-x86_64-2.7/CMakeFiles/CMakeError.log".error: > > c

Re: Trying to build to build pyarrow for python 2.7

2018-01-11 Thread simba nyatsanga
-r--r--1 simba staff 3.0M Jan 11 18:45 libparquet.a lrwxr-xr-x1 simba staff18B Jan 11 18:45 libparquet.dylib -> libparquet.1.dylib Just to clarify also, I'm attempting to build the wheel from within *arrow/python* folder where the *setup.py* file is. Thanks again for the

Re: Trying to build to build pyarrow for python 2.7

2018-01-10 Thread simba nyatsanga
t; > Are you following development instructions in > > http://arrow.apache.org/docs/python/development.html#developing-on-linux-and-macos > or something else? > > - Wes > > On Wed, Jan 10, 2018 at 11:20 AM, simba nyatsanga > <simnyatsa...@gmail.com> wrote: > > Hi,

Trying to build to build pyarrow for python 2.7

2018-01-10 Thread simba nyatsanga
Hi, I've created a python 2.7 virtualenv in my attempt to build the pyarrow project. But I'm having trouble running one of commands as specified in the development docs on Github, specifically this command: cd arrow/python python setup.py build_ext --build-type=$ARROW_BUILD_TYPE \