[ 
https://issues.apache.org/jira/browse/ARROW-7350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Li resolved ARROW-7350.
-----------------------------
    Resolution: Fixed

Issue resolved by pull request 12902
[https://github.com/apache/arrow/pull/12902]

> [Python] Parquet file metadata min and max statistics not decoded from bytes 
> for Decimal data types
> ---------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-7350
>                 URL: https://issues.apache.org/jira/browse/ARROW-7350
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.15.1
>            Reporter: Max Firman
>            Assignee: Will Jones
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 8.0.0
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Parquet file metadata for Decimal type columns contain min and max values 
> that are not decoded from bytes into Decimals. This causes issues in 
> dependent libraries like Dask (see 
> [https://github.com/dask/dask/issues/5647]).
>  
> {code:python|title=Reproducible example|borderStyle=solid}
> from decimal import Decimal
> import random
> import pandas as pd
> import pyarrow.parquet as pq
> import pyarrow as pa
> NUM_DATA_POINTS_PER_PARTITION = 25
> random.seed(0)
> data1 = [{"col1": Decimal(f"{random.randint(0, 999)}.{random.randint(0, 
> 99)}")} for i in range(NUM_DATA_POINTS_PER_PARTITION)]
> df = pd.DataFrame(data1)
> table = pa.Table.from_pandas(df)
> pq.write_table(table, 'my_data.parquet')
> parquet_file = pq.ParquetFile('my_data.parquet')
> assert 
> isinstance(parquet_file.metadata.row_group(0).column(0).statistics.min, 
> Decimal) # <-- AssertionError here because min has type bytes rather than 
> Decimal
> assert 
> isinstance(parquet_file.metadata.row_group(0).column(0).statistics.max, 
> Decimal)
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to