Hi David,

we have not a global and per-server parameter called "formfallbackcharset",
the flag for "ns_getform" and "ns_parsequery" is now called "fallbackcharset".

In many cases, using e.g. the per-server parameter should be sufficient to handle
incorrect queries...

still missing: "multipart/form-data" handling and documentation updates, error code

all the best

-gn

On 18.05.22 22:00, Gustaf Neumann wrote:

Dear David,

i've committed the option "-fallbackencodings" for the commands  "ns_getform" and "ns_parsequery". The implementation covers "ns_getform", where the data is provided as "application/x-www-form-urlencoded"  either when parsing from memory or from the spool file. The "multipart/form-data" implementation (also separate for memory and spoolfile) is not yet covered.

We can also consider a global parameter for the configuration file (like e.g. FormFallbackEncodings). Probably, we should use the term "charset" instead of "encoding", since "charset" is the MIME term, also used for e.g. "URLCharset", while "encoding" is the Tcl name.

Although the names might still change, you might test whether this works for your test cases.

-gn

On 16.05.22 16:16, David Osborne wrote:
Hi Gustaf,

I spotted that *ns_getform *takes a charset argument from looking at the source code.
The options for overriding charsets  at the moment seem to be:

*ns_getform iso8859-1
*
*
*
*ns_urlcharset iso8859-1*
*ns_getform
*
*
*
*ns_conn urlencoding iso8859-1
*
*ns_getform *

We experimented with some code which tried to trap errors from *ns_getform*, and where the error was due to "invalid UTF-8", try a fallback charset. All 3 of the above techniques worked OK when the Content-Type header leaves the charset /unspecified/.

The main issues we had were:

1. When a *charset=utf-8* is present in the *Content-Type* header, this overrides ([1]) any encoding we pass with using the 3 techniques above. In those cases we have to manipulate the headers' ns_set to remove or change the charset.
eg.
*Content-Type: application/x-www-form-urlencoded; charset=utf-8*
transform to ->
*Content-Type: application/x-www-form-urlencoded*
or
*Content-Type: application/x-www-form-urlencoded; charset=windows-1252*

2. Trapping the specific "invalid UTF-8" error - this method seems fragile - would be nice if there was an *errorCode *we would trap.
*::try {
*
*    ns_getform*
*} on error {msg options} {*
*    if { [string match "*contains invalid UTF-8" $msg] } {*
*        # change Content_type charset (if present)*
*        # try fallback charset*
*    } else {*
*        # rethrow error*
*    }*
*}*

But I think this presents us with a way forward in cases where client apps are not getting the encoding correct.

[1] https://bitbucket.org/naviserver/naviserver/annotate/master/nsd/form.c?at=master#form.c-170


_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

Reply via email to