> On 2021-07-10, at 22:02:43, Andy Seaborne <[email protected]> wrote:
>
> Hi John,
>
>
>
> On 10/07/2021 17:03, John Walker wrote:
>> We're using a 120s timeout for all the requests, which should give plenty of
>> time for the query requests to complete in regular circumstances.
>
> That's 120s at the server?
i understood that to describe a timeout which they have introduced into some
stage in the client process(es).
the logs in the served indicate that the two requests for which they have
provided identifiers completed.
were the proxy to time a request out the client would receive a 504. that does
not appear in its log.
were the sparql processor to time them out, the response would be a generic
500. that also does not appear.
> What happens if that goes off? The response is closed? (a Q for james)
were the request to time out, the upstream connection would be closed.
the proxy should close its client connection as a consequence.
>
>> As we use N-Triples, I was wondering if the N-Triples parser uses the
>> readLine method to read from the stream.
>> If there were some line that is not terminated with an EOL character, might
>> that cause this issue?
>
> From the information so far, not likely. The parse has not changed and this
> parsing path is well trodden.
>
> What matter is that it's three terms then a DOT until end-of-stream is seen.
>
> End-of-stream happens when the chunking transfer layer says so - that's in
> Apache HttpClient.
one can expect the response to have been chunked.
the proxy is not configured to cache responses.
>
> If you see a single CPU thread at 100%, the parser is looping but there isn't
> a loop except delivering triples to the graph. And the NT parse is well-used
> and quite simple.
>
> so it seems to be it is one of two cases:
>
> 1 - bytes are flowing but the parser can't send output triples in the
> destination - java heap pressure (you'll see multiple CPU threads at 100%)
>
> How long do you leave it? Eventually - many minutes (20 is possible) - this
> case will out-of-memory.
>
> 55Mbytes isn't a very large number of triples. 500K maybe (without knowing
> the data, rule of thumb 100 bytes per N-triple triple) and it's a freshly
> create graph. heap has been used up by the rest of the application?
the log entries for the two researched requests indicated 398940 and 398914
statements in the respective responses.
>
> 2 - Bytes are not flowing into the application, the parser is waiting. CPU
> usage 0%.
>
> The next question is whether the same operation will fault again or if the
> same requestURL sometimes works.
>
> The Jena code for all this is deterministic. There's no hidden parallelism in
> this case.
>
> ----
>
> HTTP is layered:
>
> Transfer-Encoding [lowest level]
> Content-Encoding
> Actual stuff (Content-type).
>
> Transfer is point-to-point, Content-Encoding is end-to-end.
> "Transfer-Encoding: chunked" is used for a stream of response bytes without
> Content-Length.
>
> What intermediaries are there between the app and Dydra? There is an nginx
> but is that acting as a reverse proxy (and what connection method does it
> use) or is Dydra providing an nginx module?
>
> Presumably there is a load balancer.
>
> Does the app talk to a gateway?
>
> Do any intermediaries cache?
>
> Each hop between systems is a point-to-point "transfer".
>
>> Model sub = ModelFactory.createDefaultModel();
>> try (TypedInputStream stream = HttpOp.execHttpGet(requestURL,
>> WebContent.contentTypeNTriples, createHttpClient(auth), null)) {
>> // The following part sometimes hangs:
>
> Are you positive it returns from HttpOp.execHttpGet and enters the parser?
>
>> RDFParser.create()
>> .source(stream)
>> .lang(Lang.NTRIPLES)
>> .errorHandler(ErrorHandlerFactory.errorHandlerStrict)
>> .parse(sub.getGraph());
>> } catch (Exception ex) {
>> // Handle the exception
>> }
>
>
> The parser does not use readLine. NT, TTL, NQ, TriG share a tokenizer and
> tokens get read from the character stream. Whitespace is discarded. The NT
> parsing is slightly permissive here (but you can't have """-strings).
>
> End of (chunked) stream will happen and that is end-of-triples. The TCP
> connection is still open but the response stream has ended. TCP is handled by
> Apache HttpClient - that's quite unlikely to be broken.
>
>
>> Otherwise if the output stream from Dydra is not closed, or the socket is
>> not closed?
>
> As james says - chunking happens (transfer presumably), and if the server say
> "end of response", a chunk of zero bytes is sent which makes the end of the
> content. If each transfer is chunked, then all close this response but the
> TCP connection is kept open.
>
> HTTP is not a simple protocol!
>
> The version of Apache Client (v4) does not support HTTP/2 so protocol upgrade
> in HTTP/2 is not happening. We in HTTP/1.1.
>
> Andy