Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-09 Thread Chanh Le
Hi Takeshi, Thank you very much. Regards, Chanh On Thu, Jun 8, 2017 at 11:05 PM Takeshi Yamamuro wrote: > I filed a jira about this issue: > https://issues.apache.org/jira/browse/SPARK-21024 > > On Thu, Jun 8, 2017 at 1:27 AM, Chanh Le wrote: > >>

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-08 Thread Takeshi Yamamuro
I filed a jira about this issue: https://issues.apache.org/jira/browse/SPARK-21024 On Thu, Jun 8, 2017 at 1:27 AM, Chanh Le wrote: > Can you recommend one? > > Thanks. > > On Thu, Jun 8, 2017 at 2:47 PM Jörn Franke wrote: > >> You can change the CSV

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-08 Thread Chanh Le
Can you recommend one? Thanks. On Thu, Jun 8, 2017 at 2:47 PM Jörn Franke wrote: > You can change the CSV parser library > > On 8. Jun 2017, at 08:35, Chanh Le wrote: > > > I did add mode -> DROPMALFORMED but it still couldn't ignore it because > the

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-08 Thread Jörn Franke
You can change the CSV parser library > On 8. Jun 2017, at 08:35, Chanh Le wrote: > > > I did add mode -> DROPMALFORMED but it still couldn't ignore it because the > error raise from the CSV library that Spark are using. > > >> On Thu, Jun 8, 2017 at 12:11 PM Jörn

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-08 Thread Chanh Le
I did add mode -> DROPMALFORMED but it still couldn't ignore it because the error raise from the CSV library that Spark are using. On Thu, Jun 8, 2017 at 12:11 PM Jörn Franke wrote: > The CSV data source allows you to skip invalid lines - this should also > include lines

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-07 Thread Jörn Franke
The CSV data source allows you to skip invalid lines - this should also include lines that have more than maxColumns. Choose mode "DROPMALFORMED" > On 8. Jun 2017, at 03:04, Chanh Le wrote: > > Hi Takeshi, Jörn Franke, > > The problem is even I increase the maxColumns it

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-07 Thread Chanh Le
Hi Takeshi, Jörn Franke, The problem is even I increase the maxColumns it still have some lines have larger columns than the one I set and it will cost a lot of memory. So I just wanna skip the line has larger columns than the maxColumns I set. Regards, Chanh On Thu, Jun 8, 2017 at 12:48 AM

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-07 Thread Takeshi Yamamuro
Is it not enough to set `maxColumns` in CSV options? https://github.com/apache/spark/blob/branch-2.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala#L116 // maropu On Wed, Jun 7, 2017 at 9:45 AM, Jörn Franke wrote: > Spark CSV data

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-07 Thread Jörn Franke
Spark CSV data source should be able > On 7. Jun 2017, at 17:50, Chanh Le wrote: > > Hi everyone, > I am using Spark 2.1.1 to read csv files and convert to avro files. > One problem that I am facing is if one row of csv file has more columns than > maxColumns (default is

[CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-07 Thread Chanh Le
Hi everyone, I am using Spark 2.1.1 to read csv files and convert to avro files. One problem that I am facing is if one row of csv file has more columns than maxColumns (default is 20480). The process of parsing was stop. Internal state when error was thrown: line=1, column=3, record=0,