Hi John,
On 10/07/2021 17:03, John Walker wrote:
We're using a 120s timeout for all the requests, which should give plenty of
time for the query requests to complete in regular circumstances.
That's 120s at the server?
What happens if that goes off? The response is closed? (a Q for james)
As we use N-Triples, I was wondering if the N-Triples parser uses the readLine
method to read from the stream.
If there were some line that is not terminated with an EOL character, might
that cause this issue?
From the information so far, not likely. The parse has not changed and
this parsing path is well trodden.
What matter is that it's three terms then a DOT until end-of-stream is seen.
End-of-stream happens when the chunking transfer layer says so - that's
in Apache HttpClient.
If you see a single CPU thread at 100%, the parser is looping but there
isn't a loop except delivering triples to the graph. And the NT parse
is well-used and quite simple.
so it seems to be it is one of two cases:
1 - bytes are flowing but the parser can't send output triples in the
destination - java heap pressure (you'll see multiple CPU threads at 100%)
How long do you leave it? Eventually - many minutes (20 is possible) -
this case will out-of-memory.
55Mbytes isn't a very large number of triples. 500K maybe (without
knowing the data, rule of thumb 100 bytes per N-triple triple) and it's
a freshly create graph. heap has been used up by the rest of the
application?
2 - Bytes are not flowing into the application, the parser is waiting.
CPU usage 0%.
The next question is whether the same operation will fault again or if
the same requestURL sometimes works.
The Jena code for all this is deterministic. There's no hidden
parallelism in this case.
----
HTTP is layered:
Transfer-Encoding [lowest level]
Content-Encoding
Actual stuff (Content-type).
Transfer is point-to-point, Content-Encoding is end-to-end.
"Transfer-Encoding: chunked" is used for a stream of response bytes
without Content-Length.
What intermediaries are there between the app and Dydra? There is an
nginx but is that acting as a reverse proxy (and what connection method
does it use) or is Dydra providing an nginx module?
Presumably there is a load balancer.
Does the app talk to a gateway?
Do any intermediaries cache?
Each hop between systems is a point-to-point "transfer".
Model sub = ModelFactory.createDefaultModel();
try (TypedInputStream stream = HttpOp.execHttpGet(requestURL,
WebContent.contentTypeNTriples, createHttpClient(auth), null)) {
// The following part sometimes hangs:
Are you positive it returns from HttpOp.execHttpGet and enters the parser?
RDFParser.create()
.source(stream)
.lang(Lang.NTRIPLES)
.errorHandler(ErrorHandlerFactory.errorHandlerStrict)
.parse(sub.getGraph());
} catch (Exception ex) {
// Handle the exception
}
The parser does not use readLine. NT, TTL, NQ, TriG share a tokenizer
and tokens get read from the character stream. Whitespace is discarded.
The NT parsing is slightly permissive here (but you can't have """-strings).
End of (chunked) stream will happen and that is end-of-triples. The TCP
connection is still open but the response stream has ended. TCP is
handled by Apache HttpClient - that's quite unlikely to be broken.
Otherwise if the output stream from Dydra is not closed, or the socket is not
closed?
As james says - chunking happens (transfer presumably), and if the
server say "end of response", a chunk of zero bytes is sent which makes
the end of the content. If each transfer is chunked, then all close
this response but the TCP connection is kept open.
HTTP is not a simple protocol!
The version of Apache Client (v4) does not support HTTP/2 so protocol
upgrade in HTTP/2 is not happening. We in HTTP/1.1.
Andy