On the subject of r->uri [was: Post 2.4.25]

William A Rowe Jr Wed, 28 Dec 2016 22:09:07 -0800

On Wed, Dec 28, 2016 at 6:42 PM, Yann Ylavic <ylavic....@gmail.com> wrote:

> [Bill, you definitely should do something with your email client, e.g.
> using plain text only, replying to your messages breaks indentation
> level (like the number of '>' preceding/according to the initial
> message)].
>

(Again, it's gmail, /shrug. I can attempt to undecorate but doubt I'm
moving to a local client/mail store again. If anyone has good gmail
formatting tips for their default settings, I'd love a pointer.)

> On Thu, Dec 29, 2016 at 12:28 AM, William A Rowe Jr <wr...@rowe-clan.net>
> wrote:
> >
> > On Dec 24, 2016 07:57, "Jim Jagielski" <j...@jagunet.com> wrote:
> >
> > Well as breaking changes go, changing URI to remain an encoded value and
> to
> > mandate module authors accept that req_rec is free threaded are breaking
> > changes.
>
> Not sure what the second point means, but preserving r->uri while also
> allowing to (and internally work with) an escaped form of the URI does
> not necessarily require an API change.
> (We could possibly have an opaque struct to manipulate the URI in its
> different forms and some helper(s) be compatible with the API changes'
> requirements, e.g. 2.4.x).
>

To be clear, this isn't possible.

There are multiple meanings of every path segment character which is
in the reserved set. There is no way to preserve these multiple meanings
in a decoded context. The parallel entities may exist in any undecoded
string. So r->uri, if it still exists, will be subsumed by some variable
like
r->uri_path_unencoded and be retrievable into a decoded form.

Functions such as ap_hook_map_to_storage, in the filesystem backend,
will only be interested in the decoded form. Functions such as the http
proxy module will only be interested in passing a never-mangled version
of the encoded uri.

Even if r->uri is available as a read-only input, there is no simple way for
httpd to resolve r->uri manipulations if changed in place (it isn't const)
and whether an r->uri_path_unencoded mismatch which is canonical,
and what mishmash the legacy abuser of r->uri did with these parallel
reserved characters in their encoded and unnencoded forms. We are
stuck with the current mess of various %-escape workarounds until
we replace the core assumption.

This deserves a long discussion which already exists in the security@
list, but needs to be pushed outward on dev@, preferably by the original
authors of these thoughts. That includes the r->uri preserving flavor
that you mention above, as well as the various discussions about the
% entity encoding, and my concerns about canonicalization. With some
first-level triage already complete, there is no reason for uri discussion
to remain 'behind the curtain.'

On the subject of r->uri [was: Post 2.4.25]

Reply via email to