Hi David,

Thanks for your reply, I'll keep an eye on that PR.

On Wed, 13 Jul 2022 at 17:43, David Li <[email protected]> wrote:

> At the moment I think it's mostly metadata, but there is a PR that
> validates non-nullable fields indeed do not contain nulls. [1]
>
> There are places in compute kernels that optimize based on the
> presence/absence of nulls but they do so mostly by looking at the physical
> data and not the type (so the optimization will still apply if there just
> happen to not be nulls).
>
> [1]: https://github.com/apache/arrow/pull/12706
>
> On Mon, Jul 11, 2022, at 17:20, Arthur Andres wrote:
>
> Hi all,
>
> Is the behaviour of pa.Field.nullable documented somewhere?
>
> I had some expectations of what it does. For example it should make sure
> that you can't have null/missing value in a column that is declared with
> nullable=False. But it doesn't seem to be the case.
>
> ```
> import pyarrow as pa
>
> schema = pa.schema(
>     [
>         pa.field("nullable_true", pa.string(), nullable=True),
>         pa.field("nullable_false", pa.string(), nullable=False),
>     ]
> )
>
> table = pa.Table.from_arrays(
>     [
>         pa.array(["", "foo", None], pa.string()),
>         pa.array(["", "foo", None], pa.string()),
>     ],
>     schema=schema,
> )
>
> assert table.schema == schema
> assert table['nullable_true'].null_count == 1
> assert table['nullable_false'].null_count == 1
> assert table.validate() is None
> assert table.validate(full=True) is None
> ```
>
> The only place where I've seen the nullable flag being used is when
> casting nested column from nullable to non-nullable:
>
> ```
> import pyarrow as pa
>
> struct_array = pa.StructArray.from_arrays(
>     [
>         pa.array(["", "foo", None], pa.string()),
>     ],
>     names=["nested_col_level_1"],
> )
> nested_table = pa.Table.from_arrays([struct_array],
> names=["nested_col_level_0"])
> assert nested_table.validate(full=True) is None
> assert nested_table.validate() is None
>
> nested_table.cast(
>     pa.schema(
>         [
>             pa.field(
>                 "nested_col_level_0",
>                 pa.struct(
>                     [pa.field("nested_col_level_1", pa.string(),
> nullable=False)]
>                 ),
>             )
>         ]
>     )
> )
> ```
>
> Thanks for your help!
>
>
>
>
>

Reply via email to