[CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

Chanh Le Wed, 07 Jun 2017 08:50:36 -0700

Hi everyone,
I am using Spark 2.1.1 to read csv files and convert to avro files.
One problem that I am facing is if one row of csv file has more columns
than maxColumns (default is 20480). The process of parsing was stop.


Internal state when error was thrown: line=1, column=3, record=0,
charIndex=12
com.univocity.parsers.common.TextParsingException:
java.lang.ArrayIndexOutOfBoundsException - 2
Hint: Number of columns processed may have exceeded limit of 2 columns. Use
settings.setMaxColumns(int) to define the maximum number of columns your
input can have
Ensure your configuration is correct, with delimiters, quotes and escape
sequences that match the input format you are trying to parse
Parser Configuration: CsvParserSettings:


I did some investigation in univocity
<https://github.com/uniVocity/univocity-parsers> library but the way it
handle is throw error that why spark stop the process.

How to skip the invalid row and just continue to parse next valid one?
Any libs can replace univocity in that job?

Thanks & regards,
Chanh
-- 
Regards,
Chanh

[CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

Reply via email to