reading csv file from null value

2015-10-22 Thread Philip Lee
Hi, I am trying to load the dataset with the part of null value by using readCsvFile(). // e.g _date|_click|_sales|_item|_web_page|_user case class WebClick(_click_date: Long, _click_time: Long, _sales: Int, _item: Int,_page: Int, _user: Int) private def getWebClickDataSet(env: ExecutionEnviro

Re: reading csv file from null value

2015-10-23 Thread Maximilian Michels
Hi Philip, How about making the empty field of type String? Then you can read the CSV into a DataSet and treat the empty string as a null value. Not very nice but a workaround. As of now, Flink deliberately doesn't support null values. Regards, Max On Thu, Oct 22, 2015 at 4:30 PM, Philip Lee wr

Re: reading csv file from null value

2015-10-23 Thread Shiti Saxena
For a similar problem where we wanted to preserve and track null entries, we load the CSV as a DataSet[Array[Object]] and then transform it into DataSet[Row] using a custom RowSerializer( https://gist.github.com/Shiti/d0572c089cc08654019c) which handles null. The Table API(which supports null) can

Re: reading csv file from null value

2015-10-24 Thread Philip Lee
Maximilian said if we handle null value with String, it would be acceptable. But in fact, readCsvFile() still cannot accept null value; they said "Row too short" in error msg. case class WebClick(click_date: String, click_time: String, user: String, item: String) private def getWebClickDataSet(env

Re: reading csv file from null value

2015-10-24 Thread Philip Lee
Plus, from Shiti to overcome this null value, we could use RowSerializer, right? I tried it in many ways, but it still did not work. Could you take an example for it according to the previous email? On Sat, Oct 24, 2015 at 11:19 PM, Philip Lee wrote: > Maximilian said if we handle null value

Re: reading csv file from null value

2015-10-26 Thread Maximilian Michels
As far as I know the null support was removed from the Table API because its support was consistently supported with all operations. See https://issues.apache.org/jira/browse/FLINK-2236 On Fri, Oct 23, 2015 at 7:18 PM, Shiti Saxena wrote: > For a similar problem where we wanted to preserve and t

Re: reading csv file from null value

2015-10-26 Thread Philip Lee
Thanks for your reply. What if I do not use Table API? The error happens when using just env.readFromCsvFile(). I heard that using RowSerializer would handle this null value, but its error of TypeInformation happens when it is converted On Mon, Oct 26, 2015 at 10:26 AM, Maximilian Michels wrote

Re: reading csv file from null value

2015-10-26 Thread Fabian Hueske
Hi Philip, the CsvInputFormat does not support to read empty fields. I see two ways to achieve this functionality: - Use a TextInputFormat that returns each line as a String and do the parsing in a subsequent MapFunction - Extend the CsvInputFormat to support empty fields Cheers, Fabian 2015-10