[ 
https://issues.apache.org/jira/browse/ARROW-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-5287:
-----------------------------------------
    Description: 
Arrays of tuples are support to be converted to either ListArray or 
StructArray, if you specify the type explicitly:

{code}
In [6]: pa.array([(1, 2), (3, 4, 5)], type=pa.list_(pa.int64())) 
Out[6]: 
<pyarrow.lib.ListArray object at 0x7f1b01a4d408>
[
  [
    1,
    2
  ],
  [
    3,
    4,
    5
  ]
]

In [7]: pa.array([(1, 2), (3, 4)], type=pa.struct([('a', pa.int64()), ('b', 
pa.int64())]))
Out[7]: 
<pyarrow.lib.StructArray object at 0x7f1b01a51b88>
-- is_valid: all not null
-- child 0 type: int64
  [
    1,
    3
  ]
-- child 1 type: int64
  [
    2,
    4
  ]
{code}

But not when no type is specified:

{code}
In [8]: pa.array([(1, 2), (3, 4)])                                              
                                                                              
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
<ipython-input-8-ab2d80c7486d> in <module>
----> 1 pa.array([(1, 2), (3, 4)])

~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib.array()

~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib._sequence_to_array()

~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Could not convert (1, 2) with type tuple: did not recognize 
Python value type when inferring an Arrow data type
{code}

Do we want to do automatic type inference for tuples as well? (defaulting to 
the ListArray case, just as arrays of python lists are supported) 
Or was there a specific reason to not support this by default?

  was:
Arrays of tuples are support to be converted to either ListArray or 
StructArray, if you specify the type explicitly:

{code}
In [6]: pa.array([(1, 2), (3, 4, 5)], type=pa.list_(pa.int64()))                
                                                                              
Out[6]: 
<pyarrow.lib.ListArray object at 0x7f1b01a4d408>
[
  [
    1,
    2
  ],
  [
    3,
    4,
    5
  ]
]

In [7]: pa.array([(1, 2), (3, 4)], type=pa.struct([('a', pa.int64()), ('b', 
pa.int64())]))                                                                  
  
Out[7]: 
<pyarrow.lib.StructArray object at 0x7f1b01a51b88>
-- is_valid: all not null
-- child 0 type: int64
  [
    1,
    3
  ]
-- child 1 type: int64
  [
    2,
    4
  ]
{code}

But not when no type is specified:

{code}
In [8]: pa.array([(1, 2), (3, 4)])                                              
                                                                              
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
<ipython-input-8-ab2d80c7486d> in <module>
----> 1 pa.array([(1, 2), (3, 4)])

~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib.array()

~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib._sequence_to_array()

~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Could not convert (1, 2) with type tuple: did not recognize 
Python value type when inferring an Arrow data type
{code}

Do we want to do automatic type inference for tuples as well? (defaulting to 
the ListArray case, just as arrays of python lists are supported) 
Or was there a specific reason to not support this by default?


> [Python] automatic type inference for arrays of tuples
> ------------------------------------------------------
>
>                 Key: ARROW-5287
>                 URL: https://issues.apache.org/jira/browse/ARROW-5287
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Joris Van den Bossche
>            Priority: Major
>
> Arrays of tuples are support to be converted to either ListArray or 
> StructArray, if you specify the type explicitly:
> {code}
> In [6]: pa.array([(1, 2), (3, 4, 5)], type=pa.list_(pa.int64())) 
> Out[6]: 
> <pyarrow.lib.ListArray object at 0x7f1b01a4d408>
> [
>   [
>     1,
>     2
>   ],
>   [
>     3,
>     4,
>     5
>   ]
> ]
> In [7]: pa.array([(1, 2), (3, 4)], type=pa.struct([('a', pa.int64()), ('b', 
> pa.int64())]))
> Out[7]: 
> <pyarrow.lib.StructArray object at 0x7f1b01a51b88>
> -- is_valid: all not null
> -- child 0 type: int64
>   [
>     1,
>     3
>   ]
> -- child 1 type: int64
>   [
>     2,
>     4
>   ]
> {code}
> But not when no type is specified:
> {code}
> In [8]: pa.array([(1, 2), (3, 4)])                                            
>                                                                               
>   
> ---------------------------------------------------------------------------
> ArrowInvalid                              Traceback (most recent call last)
> <ipython-input-8-ab2d80c7486d> in <module>
> ----> 1 pa.array([(1, 2), (3, 4)])
> ~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib.array()
> ~/scipy/repos/arrow/python/pyarrow/array.pxi in 
> pyarrow.lib._sequence_to_array()
> ~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Could not convert (1, 2) with type tuple: did not recognize 
> Python value type when inferring an Arrow data type
> {code}
> Do we want to do automatic type inference for tuples as well? (defaulting to 
> the ListArray case, just as arrays of python lists are supported) 
> Or was there a specific reason to not support this by default?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to