The TLDR version: OWASP's recommendation is specifically to render code
intended to be executed as unexecutable. I'd suggest a fix be done at
OWASP-Java-Encoder project and not here. I believe the suggestion of
providing this feature even at OWASP has near-zero value in the long run
because the purpose of formulas in Excel IS to be executed--and
Microsoft already offers the best speed bump. Here be dragons!
cc'ing my partner in crime.
============================
I apologize. This is going to be a TLDR response because I don't know
any of you professionally so I'm erring on the side of completeness.
Sincere apologies if I'm stating things you believe to be obvious, or am
myself ignorant of something obvious.
So I think there's a misunderstanding in regards to the threat described
by the OWASP article. The threat is explicitly *FORMULA *execution in
Excel--and LibreOffice. It sounds similar to a browser problem but its
not, its far worse. The reason why this particular threat tends to be
out of bounds in bug bounty programs and in CTF contests is that the
attack that exploits this is a social engineering attack which always
works in the real world. Hence why bug bounties won't pay out for it.
The recommendation from OWASP is as follows:
Encode the offending characters to:
* Equals to (|=|)
* Plus (|+|)
* Minus (|-|)
* At (|@|)
* Tab (|0x09|)
* Carriage return (|0x0D|)
* The set [;',"] be similarly escaped
While this would be a mitigation, it would also /_*purposefully
break*_//_any formulas_/ placed into a csv cell. This is a critical
point, and I'll come back to it later. It's all or nothing.
This is where Phil's comment comes in:
"Maybe I'm misinterpreting something but I thought that it could be made
possible to configure CSVFormat-object when writing the CSV data in a
way that any data with possibly corrupting values (as shown on the OWASP
page) will mask the whole contents of the cell."
First, let me stress again the risk: The threat isn't masking cell
contents, its *execution *of normal logic in a malicious way. This is
the €1M question: "How do we differentiate corrupting values from valid
values?"
Asking this csv library to do it means it has to take on quite a bit of
intelligence. It doesn't just have to understand what a CSV format is
anymore. It has to answer questions like "/*What's a corrupt equal sign
look like?*/" And it looks like a valid equal sign. So to do this
right, you have to do lexical analysis and parsing the same way that
Excel is going to do it, and THEN you have to infer behavior.
Therefore to determine what corrupt characters look like given data
designed to be executed you are now in the business of trying to
interpret what the excel formula is doing, in order to determine whether
or not its safe. This is the core problem: formulas are bits of
/user-supplied/ /code /*designed to be executed*. If you escape it, you
break it. At best, you annoy the hell out of the accountant who was
expecting your web app to offer a usable spreadsheet, while adding one
layer of manual intervention other than the standard warning that MS
Office provides whenever you open an Excel not created on your machine.
So... what can we do about it? Microsoft already did it:
IMHO there's nothing that any intermediary library can do that's any
better than this. Web applications designed to take spreadsheets as
input are special beasts. The proper security rule of thumb is to
always ensure DATA is treated as DATA. But that rule gets *really
funky* when that DATA is actually supposed to be executable code. But
that's your choice: if you don't want it to execute you have to force
it to be data, which will break execution by programmer intent.
However, I suspect a few of you will be unhappy with my "do nothing"
suggestion and insist that something ought to be done.
I would recommend writing a CSV encoder for the owasp-java-encoder
project. https://github.com/OWASP/owasp-java-encoder The framework is
already in place and its where I push people if they only need encoding
functions.
Why I wouldn't do it here: libraries like this have to be written to
the lowest-common-denominator, meaning csv format projects that don't
have Excel as a target. You want security functions to process as close
to the business logic as possible, and this is the wrong target for
that. Doing it here means not breaking legacy code, which means by
default, the option will be off. (Or you follow a deprecation
strategy.) Further--this gets to my original hint about threat
models--executing formulas in cells is a /desired function/ of Excel and
its copies. When developers start breaking spreadsheets they're going
to revert to legacy behavior meaning you're really talking about
improving the defensive capability for the security-minded developers
that can stand up to the finance department. When OWASP tells you "This
attack is difficult to mitigate," it isn't just the technical issues
involved--which I just outlined--its social. This is why I'm hesitant
to offer up "We'll do it in ESAPI," because I don't see the value-add in
the bigger picture. Plus, _*/this is Microsoft's fault/*_ and I'm not
thrilled with writing code to speedbump *their* problem. Which, I feel
they've addressed as well as they ever will.
On 11/11/2021 4:36 AM, P. Ottlinger wrote:
Hi guys,
thanks for your reply.
Maybe I'm misinterpreting something but I thought that it could be made
possible to configure CSVFormat-object when writing the CSV data in a
way that any data with possibly corrupting values (as shown on the OWASP
page) will mask the whole contents of the cell.
Thus a library such as commons-csv would be able to lower the risk for
CSV injection and not every client/customer would have to manually
create this protecting logic.
To my mind it's a simple parser for "dangerous" tokens that quotes the
given data with additional " .... as we do not need to write
functioning Excel formulas into CSV.
WDYT?
Cheers,
Phil
Am 10.11.21 um 20:53 schrieb Gary Gregory:
I agree with Matt. CSV is just a container, it doesn't know or care what
the concept of a "formula" is.
Gary