I believe the Arrow parser expects the last line to be: "2015,Chevy,Volt,," (i.e. have commas for the missing data).
On Thu, Mar 17, 2022 at 3:23 PM Sricheta Ruj <[email protected]> wrote: > Hello. > > > > I am using pyarrow csv module. > > > > from pyarrow import csv > > fn = '/home/srruj/cars.csv' > > read_options=csv.ReadOptions(column_names=(‘year’, ‘make’, ‘model’, > ‘comment’, ‘blank’)) > > convert_options = csv.ConvertOptions(include_columns=column_names=(‘year’, > ‘make’, ‘model’, ‘comment’, ‘blank’), > > include_missing_columns=True, > > strings_can_be_null=True) > > > > table = csv.read_csv(fn, read_options=read_options, > convert_options=convert_options) > > table > > > > I am getting the following error : > > Csv parse error: Expected 5 columns, got 3 > > > > This is how file looks: > > > > year,make,model,comment,blank > > "2012","Tesla","S","No comment", > > 1997,Ford,E350,"Go get one now they are going fast", > > 2015,Chevy,Volt > > > > I am able to read this file from spark using spark.read.csv(..) but not > using pyarrow. > > > > Can you please help? > > > > Thanks > > Sricheta. > > > > >
