That sounds like a good idea. Just like setDelimeter("|"), one should be
able to do a setParseDoubleQuotes(false) to disable the special handling of
double quotes.You're right, Fabian, the current implementation treats all String fields alike. Maybe we can expect the user to provide a consistently formatted input file (i.e. with or without the use of double quotes as identifiers)? On Tue, Dec 9, 2014 at 2:32 PM, Fabian Hueske <[email protected]> wrote: > With the current implementation, quoted string parsing kicks in, if the > first non-whitespace character of a field is a double quote (just as in > Malte's case). I think this behaviour can be quite unexpected for users. > Wouldn't it be better to make the behaviour of the String parsing more > explicit, i.e., add a switch to dis/enable quoted string parsing. With the > current implementation, the configuration would affect all String fields in > a file, though... > > Cheers, Fabian > > 2014-12-09 12:17 GMT+01:00 Max Michels <[email protected]>: > >> Hi Malte, >> >> Typically, double quotes are used to identify strings and thus are not >> interpreted literally. Any data in a field after a double quoted string is >> regarded as invalid trailing data. >> >> You could replace double quotes with single quotes: >> >> A|ggg >> B|'hhh' xx >> C|xxx >> >> This results in the expected >'hhh' xx< for the second line. >> >> Best regards, >> Max >> >> On Fri, Dec 5, 2014 at 4:44 PM, Malte Schwarzer <[email protected]> wrote: >> >>> Hi Stephan, >>> >>> The result should be >"hhh“ xx< as field value. Enclosures should be >>> disabled but there seems to be no method to do that. >>> >>> >>> Malte >>> >>> Von: Stephan Ewen <[email protected]> >>> Antworten an: <[email protected]> >>> Datum: Freitag, 5. Dezember 2014 16:28 >>> An: <[email protected]> >>> Betreff: Re: Quotes in fields of CsvInputFormat >>> >>> Hi! >>> >>> The parser interprets the quotes as quotes for the field. That means the >>> second field (the string) stops after the "hhh" and the xx is considered >>> invalid trailing data. >>> >>> What do you expect as the result of parsing that line? >>> >>> Stephan >>> >>> >>> On Fri, Dec 5, 2014 at 4:16 PM, Malte Schwarzer <[email protected]> wrote: >>> >>>> Hi, >>>> >>>> I’m try to import a CSV file but the parser seems to have problems this >>>> quotes in the beginning of a field. Is there a way to set or disable >>>> enclosures for the CSV input? >>>> >>>> This is my code: >>>> >>>> DataSet<Tuple2<String, String>> res = env.readCsvFile(inputCsvFilename) >>>> .fieldDelimiter('|') >>>> .types(String.class, String.class) >>>> >>>> CSV: >>>> >>>> A|ggg >>>> B|"hhh" xx >>>> C|xxx >>>> >>>> As result I’m receiving a ParserException for line B: >>>> >>>> *org.apache.flink.api.common.io.ParseException: Line could not be >>>> parsed: 'B|"hhh" xx**‘* >>>> >>>> >>>> Thanks, >>>> Malte >>>> >>> >>> >> >
