On 2015-08-01 09:28 PM, Igor Tandetnik wrote:
> On 8/1/2015 12:38 PM, R.Smith wrote:
>> if I have this csv line, what values must the parser end up with?:
>>
>> 1, "2", "3" 4, 5 "6", 7
>
> This is not a valid line of CSV, at least not as specified in RFC 
> 4180. Therefore, RFC 4180-conforming parsers may differ in their 
> interpretation of this line. There is no particular set of values that 
> the parser "must" end up with, assuming you use the word "must" with 
> the meaning specified in RFC 2119.

Indeed so. The RFC calls for values to be "enclosed" by double quotes, 
or not. It does not specify this kind of dual value, which isn't valid 
csv (if you expect a specific result from another csv parser, that is.)

There is however no call to ignore it, and so made a great meta-data 
opportunity for me in the past. This kind of CSV was completely valid 
and would get parsed while simply ignoring the parts after the quotes:

V1, V2,          V3,           V4
1 , "John" STR,  "" NULL,      4
2 , "" NULL,     "42" INT,     4
3 , "James" STR, "Smith" STR,  7

You get the idea... I could parse it with one tool and retrieve the 
"metadata", and it would  still work in Excel and other csv parsers who 
simply ignored any bits after the closing quote and before the comma.

Lately though, Excel has changed its ways and actually will input the 
entire bit between commas as if a string. So this trick of mine has 
become useless, but just an interesting digression since csv's flaws 
been brought up.



Reply via email to