At the moment I think it's mostly metadata, but there is a PR that validates 
non-nullable fields indeed do not contain nulls. [1]

There are places in compute kernels that optimize based on the presence/absence 
of nulls but they do so mostly by looking at the physical data and not the type 
(so the optimization will still apply if there just happen to not be nulls).

[1]: https://github.com/apache/arrow/pull/12706

On Mon, Jul 11, 2022, at 17:20, Arthur Andres wrote:
> Hi all,
> 
> Is the behaviour of pa.Field.nullable documented somewhere? 
> 
> I had some expectations of what it does. For example it should make sure that 
> you can't have null/missing value in a column that is declared with 
> nullable=False. But it doesn't seem to be the case.
> 
> ```
> import pyarrow as pa
> 
> schema = pa.schema(
>     [
>         pa.field("nullable_true", pa.string(), nullable=True),
>         pa.field("nullable_false", pa.string(), nullable=False),
>     ]
> )
> 
> table = pa.Table.from_arrays(
>     [
>         pa.array(["", "foo", None], pa.string()),
>         pa.array(["", "foo", None], pa.string()),
>     ],
>     schema=schema,
> )
> 
> assert table.schema == schema
> assert table['nullable_true'].null_count == 1
> assert table['nullable_false'].null_count == 1
> assert table.validate() is None
> assert table.validate(full=True) is None
> ```
> 
> The only place where I've seen the nullable flag being used is when casting 
> nested column from nullable to non-nullable:
> 
> ```
> import pyarrow as pa
> 
> struct_array = pa.StructArray.from_arrays(
>     [
>         pa.array(["", "foo", None], pa.string()),
>     ],
>     names=["nested_col_level_1"],
> )
> nested_table = pa.Table.from_arrays([struct_array], 
> names=["nested_col_level_0"])
> assert nested_table.validate(full=True) is None
> assert nested_table.validate() is None
> 
> nested_table.cast(
>     pa.schema(
>         [
>             pa.field(
>                 "nested_col_level_0",
>                 pa.struct(
>                     [pa.field("nested_col_level_1", pa.string(), 
> nullable=False)]
>                 ),
>             )
>         ]
>     )
> )
> ```
> 
> Thanks for your help!
> 
> 
> 

Reply via email to