Re: Thank you

2020-08-27 Thread Lucas Pickup
This is an awesome sentimate, thank you Release orchestrstors and contributors! Cheers, Lucas On Thu, Aug 27, 2020 at 1:26 PM Jorge Cardoso Leitão < jorgecarlei...@gmail.com> wrote: > Hi, > > > > I am writing to just thank all those involved in the release process. > > Sometimes the work of

Table.cast throws ArrowNotImplementedError (pyarrow==0.15.0)

2019-10-08 Thread Lucas Pickup
So it seems in 'pyarrow==0.15.0' `Table.columns` now returns ChunkedArray instead of Column. This has broken `Table.cast()` as it just calls `Table.itercolumns` and expects the yielded values to have a `.cast()` method, which ChunkedArray doesn't. Was `Table.cast()` missed in cleaning up after

[jira] [Created] (ARROW-1436) PyArrow Timestamps written to Parquet as INT96 appear in Spark as 'bigint'

2017-08-30 Thread Lucas Pickup (JIRA)
Lucas Pickup created ARROW-1436: --- Summary: PyArrow Timestamps written to Parquet as INT96 appear in Spark as 'bigint' Key: ARROW-1436 URL: https://issues.apache.org/jira/browse/ARROW-1436 Project

[jira] [Created] (ARROW-1435) PyArrow not propagating timezone information from Parquet to Pyhon

2017-08-30 Thread Lucas Pickup (JIRA)
Lucas Pickup created ARROW-1435: --- Summary: PyArrow not propagating timezone information from Parquet to Pyhon Key: ARROW-1435 URL: https://issues.apache.org/jira/browse/ARROW-1435 Project: Apache Arrow

RE: PyArrow not retaining Parquet metadata

2017-08-30 Thread Lucas Pickup
Please reply to: lucas.pic...@microsoft.com Outlook isn't playing nice. Apologies, Lucas Pickup -Original Message- From: Lucas Pickup [mailto:lucas.pic...@microsoft.com.INVALID] Sent: Wednesday, August 30, 2017 10:47 AM To: dev@arrow.apache.org Subject: PyArrow not retaining Parquet

PyArrow not retaining Parquet metadata

2017-08-30 Thread Lucas Pickup
_version": "0.20.3", "columns": [{"name": "DateNaive", "pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}, {"name": "DateAware", "pandas_type": "datetimetz", "numpy_type": "datetime64[ns, UTC]", "metadata": {"timezone": "UTC"}}], "index_columns": ["__index_level_0__"]} >>> >>> pyarrowDF = pyarrowTable.to_pandas() >>> pyarrowDF DateNaive DateAware 0 2015-07-05 23:50:00 2015-07-05 23:50:00 >>> This was on PyArrow 0.6.0. Cheers, Lucas Pickup

Re: Reading Parquet datetime column gives different answer in Spark vs PyArrow

2017-08-28 Thread Lucas Pickup
Here is the pyspark script I used to see this difference. On Mon, 28 Aug 2017 at 09:20 Lucas Pickup <lucas.tot0.pic...@gmail.com> wrote: > Hi all, > > Very sorry if people already responded to this at: > lucas.pic...@microsoft.com There was an INVALID identifier att

Reading Parquet datetime column gives different answer in Spark vs PyArrow

2017-08-28 Thread Lucas Pickup
, newArray) table = table.remove_column(i) table = table.add_column(i, newColumn) return table Cheers, Lucas Pickup

RE: Reading Parquet datetime column gives different answer in Spark vs PyArrow

2017-08-25 Thread Lucas Pickup
('ns', tz='GMT')) newColumn = pa.Column.from_array(newField, newArray) table = table.remove_column(i) table = table.add_column(i, newColumn) return table Cheers, Lucas Pickup From: Lucas Pickup [mailto:lucas.pic...@microsoft.com.INVALID] Sent: Friday, August 25

Reading Parquet datetime column gives different answer in Spark vs PyArrow

2017-08-25 Thread Lucas Pickup
2015-07-06 06:50:00 1 2015-07-06 06:30:00 I would've expected to end up with the same datetime from both readers since there was no timezone attached at any point. It just a date and time value. Am I missing anything here? Or is this a bug. Cheers, Lucas Pickup

Major difference between Spark and Arrow Parquet Implementations

2017-08-16 Thread Lucas Pickup
ps://github.com/apache/spark/blob/cba826d00173a945b0c9a7629c66e36fa73b723e/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L565>. I was wondering if there was a reason why the implementations have such a major difference when it comes to schema generation? Cheers, Lucas Pickup