If you have found a parser that works, simply read the data as text files, apply the parser manually, and convert to DataFrame (if needed at all), ________________________________ From: Saurabh Gulati <[email protected]> Sent: Wednesday, January 4, 2023 3:45 PM To: Sean Owen <[email protected]> Cc: Mich Talebzadeh <[email protected]>; User <[email protected]> Subject: [EXTERNAL] Re: Re: Incorrect csv parsing when delimiter used within the data
ATTENTION: This email originated from outside of GM. Hi @Sean Owen<mailto:[email protected]> Probably the data is incorrect, and the source needs to fix it. But using python's csv parser returns the correct results. import csv with open("/tmp/test.csv") as c_file: csv_reader = csv.reader(c_file, delimiter=",") for row in csv_reader: print(row) ['a', 'b', 'c'] ['1', '', ',see what "I did",\ni am still writing'] ['2', '', 'abc'] And also, I don't understand why there is a distinction in outputs from df.show() and df.select("c").show() Mvg/Regards Saurabh Gulati Data Platform ________________________________ From: Sean Owen <[email protected]> Sent: 04 January 2023 14:25 To: Saurabh Gulati <[email protected]> Cc: Mich Talebzadeh <[email protected]>; User <[email protected]> Subject: Re: [EXTERNAL] Re: Incorrect csv parsing when delimiter used within the data That input is just invalid as CSV for any parser. You end a quoted col without following with a col separator. What would the intended parsing be and how would it work? On Wed, Jan 4, 2023 at 4:30 AM Saurabh Gulati <[email protected]<mailto:[email protected]>> wrote: @Sean Owen<mailto:[email protected]> Also see the example below with quotes feedback: "a","b","c" "1","",",see what ""I did""," "2","","abc"
