On 2/01/2014 10:08 a.m., Alex Rousskov wrote: > On 01/01/2014 05:15 AM, Amos Jeffries wrote: >> This patch renames the HttpParser class as Http1Parser (to differentiate >> from the future planned Http2Parser) and moves it into the Http namespace. > ... >> Also, the HttpRequestMethod class is moved into the Http namespace and >> library to reduce dependencies on the parser class outside the library. > > >> Overall this does not seem to affect performance much. > > Any performance testing performed on this patch to support the above > estimation?
Semi-casual runs of apachebench on trunk rev.13197 compared with this branch as of the patch submitted for audit. Using a HIT server object with warm cache from http://example.com/. Difference was consistently within +/- 1 req/sec of trunk, with variation decreasing as concurrency increased. >> Http::Http1ParserPointer > > The proposed naming approach results in awkward names with too many > "HTTP"s in them. I suggest a different arrangement: > > * namespace Http // stuff common to HTTP/1 and HTTP/2 handling > * namespace Http::One // stuff specific to HTTP/1 handling > * namespace Http::Two // stuff specific to future HTTP/2 handling > * No Http prefixes for names inside the above three namespaces. > > This alternative approach would result in > > Http::One::Parser > Http::Two::ParserPointer > > instead of > > Http::Http1Parser > Http::Http2ParserPointer > I was wondering about Parser1 / Parser1Pointer , Parser2 / Parser2Pointer as an alternative? Only the parser class seems to be needing the version difference. Everything else seems to be generic or relevant only to one of the versions. > Same length, but better looking names without repetition of information. > Moreover, after adding these global namespace aliases (to http/forward.h): > > namespace Http1 = Http::One; > namespace Http2 = Http::Two; > > we can use > > Http1::Parser > Http2::ParserPointer > > which are even shorter that the proposed names. > > Note that declaring Http1 and Http2 namespaces outside of the Http > namespace (rather than namespace aliases) is probably a bad idea because > > * it will force us to use Http:: prefixes inside Http1 and Http2 > namespaces, which is kind of wrong, OR > > * force us to add blank "use Http" statements, which opens another can > of warms, especially if used in header files. > Um.. > > The final sketch, putting all the naming-related pieces together: > >> namespace Http { >> namespace One { >> class Parser ... >> }; >> namespace Two { >> class Parser ... >> }; >> }; >> namespace Http1 = Http::One; >> namespace Http2 = Http::Two; > You just said that was a bad idea? > You can even place all HTTP/N-specific protocol code into src/http/N/ > directories, but that may be an overkill today. > > >> === modified file 'src/client_side.cc' > >> +prepareAcceleratedURL(ConnStateData * conn, ClientHttpRequest *http, const >> Http::Http1ParserPointer &hp) > ... >> +prepareTransparentURL(ConnStateData * conn, ClientHttpRequest *http, const >> Http::Http1ParserPointer &hp) > > If possible, pass references to the parser instead of pointers. All > other factors being equal, references are cheaper and safer than > pointers, so code that does not store an object (or pass an object for > storage by others) should accept a reference, especially when the object > is guaranteed to exist in the callers. > Done. De-referenced the parser pointer in ConnStateData before passing to the entire call chain as regular non-const object reference. In order to make the parser incremental and not be reset on every input byte (worst case slow-loris attack) the Parser object is allocated dynamically by ConnStateData when a new request is expected. > >> + const char *url = hp->requestUri().c_str(); >> if (*url != '/') { > > Declare the "url" pointer itself as constant, to expose "url is no > longer the same as requestUri()" problems discussed further below. > > >> + const char *url = hp->requestUri().c_str(); >> if (*url != '/') { > > Please add an "XXX: performance regression" comment for the above if you > do not want to convert the rest of url-using code to use the new SBuf API. > Yes that was the intention. I am proposing this patch as an initial merge before we start implementing anything on top of the incomplete Tokeniser. Comment added for now. > >> - int url_sz = strlen(url) + 32 + Config.appendDomainLen + >> + int url_sz = hp->requestUri().length() + 32 + >> Config.appendDomainLen + >> ... >> - int url_sz = strlen(url) + 32 + Config.appendDomainLen; >> + int url_sz = hp->requestUri().length() + 32 + >> Config.appendDomainLen; > > AFAICT, there are cases where url is no longer the same as requestUri() > by the time you get to this code in prepareAcceleratedURL(). Yes there are cases when it is shortened. I do not see any cases where it is extended, so this is adding a small over-allocation of memory by url_sz until we fully convert to SBuf. NP: The magic +32 is already over-allocating for the scheme name and delimiters. > > Please make url_sz constant if possible since you are modifying its > declaration. Same for url_sz in prepareTransparentURL(). > Done. > >> - if (*url != '/') >> + if (hp->requestUri()[0] != '/') >> return; /* already in good shape */ > > The old code "worked" for empty URLs (""). The new does not. I do not > know whether empty URLs are possible at this point. If they are not > possible, perhaps add a comment or even a Must()? > The parseFirstLine logics ensure there is always a URL. Either by treating the version tag as the URL, or aborting with a parse error. Do you really think we need to add an unused check for length() ? > >> + bool parsedOk = hp->parseRequest(); > > Please declare as constant if possible. Done. > >> === renamed file 'src/HttpParser.h' => 'src/http/Http1Parser.h' > >> + /// the request-line URI if this is a request, or an empty string. >> + SBuf requestUri() const {return uri_;} > > Consider returning "const SBuf &" to avoid refcounting overheads. Why is const SBuf &uri() const {retun uri_;} const SBuf uri() const {retun uri_;} SBuf &uri() {retun uri_;} ==> error: passing âconst SBufâ as âthisâ argument of âconst char* SBuf::c_str()â discards qualifiers SBuf &uri() const {retun uri_;} ==> invalid initialization of reference of type 'SBuf&' from expression of type 'const SBuf' Whereas return via copy-construct provides a non-const SBuf using the non-allocating SBuf copy against the shared MemBlob. > this method in the general message parser? Should not it (and the > corresponding data member) be placed in HTTP *request* parser? > The request parsing is in the generic HttpParser. The response parsing is in the generic HttpMsg class but the intention is to move both into here and: 1) avoid the current design of 2-pass parsing: a) generic-parse just to determine it is a request-line, then b) re-parse with HttpRequest to get the fields already found by HttpParser. 2) avoid adding parser overheads to the HttpRequest and HttpResponse classes. 3) avoid having to allocate a HttpRequest/HttpResponse object just to test parsing for invalid request/responses. 4) make HttpRequest and HttpResponse generic enough for use in both protocols and be spawned by the parser(s). This accessor is the one to retrieve the (1a) request-line data field for use in (1b) creating the HttpRequest object without re-parsing. It can also be used anywhere the raw-URI is needed before HttpRequest exists - which is what you see in the current patch. (Same goes for the method accessor.) We also need to work around two other traffic flow properties: * in HTTP/2 the response parse needs to handle PUSH_PROMISE "requests", so the API separation becomes more con than pro. * request parsing may result in a response error object being instantiated. And thus a status code results from the request-line. I figure its better at this stage to have a set of accessors for each of the possible fields (request-line, status-line and mime-header areas) and two parse methods to be called as appropriate from the client-side or server-side as to which message type is to be interpreted. The final intended result after everything is finished is for HttpParser to spawn Http::Request and/or Http::Reply filled with the basic details the Parser detects. So most of the uses you see in this patch will eventually become things using the HttpRequest object and its accessors. But that is some ways down the track still. > >> + /// the HTTP method if this is a request method >> + const HttpRequestMethodPointer & method() const {return method_;} > > Similar concerns here. This should be in a request parser IMO. > Same results as uri() accessor. Although this one works with const return value because assignment is a data copy of the image String. > Also, s/a request method/a request message/ if method() stays in the > general message parser. Done. > > Finally, either this method should return a method object (rather than a > pointer to it) OR we should get rid of METHOD_NONE. Having two ways to > signal a missing method is too much IMO. The former is probably easier > and requires fewer code changes. Tryimg. I think the Pointer was there to avoid data copies. The method class is relatively heavy to copy/assign with its String internals. NOTE: New patch to follow once the remaining questions above are agreed and branch re-tested. Amos
