On Wed, Dec 28, 2016 at 6:42 PM, Yann Ylavic <ylavic....@gmail.com> wrote:
> [Bill, you definitely should do something with your email client, e.g. > using plain text only, replying to your messages breaks indentation > level (like the number of '>' preceding/according to the initial > message)]. > (Again, it's gmail, /shrug. I can attempt to undecorate but doubt I'm moving to a local client/mail store again. If anyone has good gmail formatting tips for their default settings, I'd love a pointer.) > On Thu, Dec 29, 2016 at 12:28 AM, William A Rowe Jr <wr...@rowe-clan.net> > wrote: > > > > On Dec 24, 2016 07:57, "Jim Jagielski" <j...@jagunet.com> wrote: > > > > Well as breaking changes go, changing URI to remain an encoded value and > to > > mandate module authors accept that req_rec is free threaded are breaking > > changes. > > Not sure what the second point means, but preserving r->uri while also > allowing to (and internally work with) an escaped form of the URI does > not necessarily require an API change. > (We could possibly have an opaque struct to manipulate the URI in its > different forms and some helper(s) be compatible with the API changes' > requirements, e.g. 2.4.x). > To be clear, this isn't possible. There are multiple meanings of every path segment character which is in the reserved set. There is no way to preserve these multiple meanings in a decoded context. The parallel entities may exist in any undecoded string. So r->uri, if it still exists, will be subsumed by some variable like r->uri_path_unencoded and be retrievable into a decoded form. Functions such as ap_hook_map_to_storage, in the filesystem backend, will only be interested in the decoded form. Functions such as the http proxy module will only be interested in passing a never-mangled version of the encoded uri. Even if r->uri is available as a read-only input, there is no simple way for httpd to resolve r->uri manipulations if changed in place (it isn't const) and whether an r->uri_path_unencoded mismatch which is canonical, and what mishmash the legacy abuser of r->uri did with these parallel reserved characters in their encoded and unnencoded forms. We are stuck with the current mess of various %-escape workarounds until we replace the core assumption. This deserves a long discussion which already exists in the security@ list, but needs to be pushed outward on dev@, preferably by the original authors of these thoughts. That includes the r->uri preserving flavor that you mention above, as well as the various discussions about the % entity encoding, and my concerns about canonicalization. With some first-level triage already complete, there is no reason for uri discussion to remain 'behind the curtain.'