Hi everyone, I am using Spark 2.1.1 to read csv files and convert to avro files. One problem that I am facing is if one row of csv file has more columns than maxColumns (default is 20480). The process of parsing was stop.
Internal state when error was thrown: line=1, column=3, record=0, charIndex=12 com.univocity.parsers.common.TextParsingException: java.lang.ArrayIndexOutOfBoundsException - 2 Hint: Number of columns processed may have exceeded limit of 2 columns. Use settings.setMaxColumns(int) to define the maximum number of columns your input can have Ensure your configuration is correct, with delimiters, quotes and escape sequences that match the input format you are trying to parse Parser Configuration: CsvParserSettings: I did some investigation in univocity <https://github.com/uniVocity/univocity-parsers> library but the way it handle is throw error that why spark stop the process. How to skip the invalid row and just continue to parse next valid one? Any libs can replace univocity in that job? Thanks & regards, Chanh -- Regards, Chanh