I believe the Arrow parser expects the last line to be:
"2015,Chevy,Volt,,"
(i.e. have commas for the missing data).

On Thu, Mar 17, 2022 at 3:23 PM Sricheta Ruj <[email protected]>
wrote:

> Hello.
>
>
>
> I am using pyarrow csv module.
>
>
>
> from pyarrow import csv
>
> fn = '/home/srruj/cars.csv'
>
> read_options=csv.ReadOptions(column_names=(‘year’, ‘make’, ‘model’,
> ‘comment’, ‘blank’))
>
> convert_options = csv.ConvertOptions(include_columns=column_names=(‘year’, 
> ‘make’, ‘model’, ‘comment’, ‘blank’),
>
>                                      include_missing_columns=True,
>
>                                      strings_can_be_null=True)
>
>
>
> table = csv.read_csv(fn, read_options=read_options,
> convert_options=convert_options)
>
> table
>
>
>
> I am getting the following error :
>
> Csv parse error: Expected 5 columns, got 3
>
>
>
> This is how file looks:
>
>
>
> year,make,model,comment,blank
>
> "2012","Tesla","S","No comment",
>
> 1997,Ford,E350,"Go get one now they are going fast",
>
> 2015,Chevy,Volt
>
>
>
> I am able to read this file from spark using spark.read.csv(..) but not
> using pyarrow.
>
>
>
> Can you please help?
>
>
>
> Thanks
>
> Sricheta.
>
>
>
>
>

Reply via email to