RE: Major difference between Spark and Arrow Parquet Implementations

2017-08-16 Thread Erin Sobkow
7 11:30 AM To: dev@arrow.apache.org Subject: Re: Major difference between Spark and Arrow Parquet Implementations hi Erin -- please send a separate e-mail to dev-unsubscr...@arrow.apache.org Thanks On Wed, Aug 16, 2017 at 1:06 PM, Erin Sobkow wrote: > Hi Wes: > > Somehow I have been inadv

Re: Major difference between Spark and Arrow Parquet Implementations

2017-08-16 Thread Wes McKinney
com] > Sent: August 16, 2017 10:04 AM > To: dev@arrow.apache.org > Subject: Re: Major difference between Spark and Arrow Parquet Implementations > > hi Lucas, > > My understanding is that the Parquet format by itself does not place any such > restrictions on the names of fields,

RE: Major difference between Spark and Arrow Parquet Implementations

2017-08-16 Thread Erin Sobkow
McKinney [mailto:wesmck...@gmail.com] Sent: August 16, 2017 10:04 AM To: dev@arrow.apache.org Subject: Re: Major difference between Spark and Arrow Parquet Implementations hi Lucas, My understanding is that the Parquet format by itself does not place any such restrictions on the names of fields,

Re: Major difference between Spark and Arrow Parquet Implementations

2017-08-16 Thread Wes McKinney
hi Lucas, My understanding is that the Parquet format by itself does not place any such restrictions on the names of fields, and so this is a Spark SQL-specific issue (anyone please correct me if I'm mistaken about this). I would be happy to help add a schema cleaning option to normalize field nam

Major difference between Spark and Arrow Parquet Implementations

2017-08-16 Thread Lucas Pickup
Hello, I have been using pyarrow and PySpark to write Parquet files. I have used pyarrow to successfully write out a Parquet file with spaces in column names. E.g. 'X Coordinate'. When I try to write out the same dataset using Sparks Parquet writer it fails claiming: "Attribute name "X Coordina