Hi Takeshi, Jörn Franke, The problem is even I increase the maxColumns it still have some lines have larger columns than the one I set and it will cost a lot of memory. So I just wanna skip the line has larger columns than the maxColumns I set.
Regards, Chanh On Thu, Jun 8, 2017 at 12:48 AM Takeshi Yamamuro <linguin....@gmail.com> wrote: > Is it not enough to set `maxColumns` in CSV options? > > > https://github.com/apache/spark/blob/branch-2.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala#L116 > > // maropu > > On Wed, Jun 7, 2017 at 9:45 AM, Jörn Franke <jornfra...@gmail.com> wrote: > >> Spark CSV data source should be able >> >> On 7. Jun 2017, at 17:50, Chanh Le <giaosu...@gmail.com> wrote: >> >> Hi everyone, >> I am using Spark 2.1.1 to read csv files and convert to avro files. >> One problem that I am facing is if one row of csv file has more columns >> than maxColumns (default is 20480). The process of parsing was stop. >> >> Internal state when error was thrown: line=1, column=3, record=0, >> charIndex=12 >> com.univocity.parsers.common.TextParsingException: >> java.lang.ArrayIndexOutOfBoundsException - 2 >> Hint: Number of columns processed may have exceeded limit of 2 columns. >> Use settings.setMaxColumns(int) to define the maximum number of columns >> your input can have >> Ensure your configuration is correct, with delimiters, quotes and escape >> sequences that match the input format you are trying to parse >> Parser Configuration: CsvParserSettings: >> >> >> I did some investigation in univocity >> <https://github.com/uniVocity/univocity-parsers> library but the way it >> handle is throw error that why spark stop the process. >> >> How to skip the invalid row and just continue to parse next valid one? >> Any libs can replace univocity in that job? >> >> Thanks & regards, >> Chanh >> -- >> Regards, >> Chanh >> >> > > > -- > --- > Takeshi Yamamuro > -- Regards, Chanh