Hi Takeshi, Jörn Franke,

The problem is even I increase the maxColumns it still have some lines have
larger columns than the one I set and it will cost a lot of memory.
So I just wanna skip the line has larger columns than the maxColumns I set.

Regards,
Chanh


On Thu, Jun 8, 2017 at 12:48 AM Takeshi Yamamuro <linguin....@gmail.com>
wrote:

> Is it not enough to set `maxColumns` in CSV options?
>
>
> https://github.com/apache/spark/blob/branch-2.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala#L116
>
> // maropu
>
> On Wed, Jun 7, 2017 at 9:45 AM, Jörn Franke <jornfra...@gmail.com> wrote:
>
>> Spark CSV data source should be able
>>
>> On 7. Jun 2017, at 17:50, Chanh Le <giaosu...@gmail.com> wrote:
>>
>> Hi everyone,
>> I am using Spark 2.1.1 to read csv files and convert to avro files.
>> One problem that I am facing is if one row of csv file has more columns
>> than maxColumns (default is 20480). The process of parsing was stop.
>>
>> Internal state when error was thrown: line=1, column=3, record=0,
>> charIndex=12
>> com.univocity.parsers.common.TextParsingException:
>> java.lang.ArrayIndexOutOfBoundsException - 2
>> Hint: Number of columns processed may have exceeded limit of 2 columns.
>> Use settings.setMaxColumns(int) to define the maximum number of columns
>> your input can have
>> Ensure your configuration is correct, with delimiters, quotes and escape
>> sequences that match the input format you are trying to parse
>> Parser Configuration: CsvParserSettings:
>>
>>
>> I did some investigation in univocity
>> <https://github.com/uniVocity/univocity-parsers> library but the way it
>> handle is throw error that why spark stop the process.
>>
>> How to skip the invalid row and just continue to parse next valid one?
>> Any libs can replace univocity in that job?
>>
>> Thanks & regards,
>> Chanh
>> --
>> Regards,
>> Chanh
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>
-- 
Regards,
Chanh

Reply via email to