That's a really good suggestion.  In the normal code flow do you ever call 
stream.close() ? And is createHttpClient() re-using an existing HttpClient 
object ? And is the hang only happening after some requests have succeeded ?

It is possible what is happening is that you aren't closing the stream (and I 
don't believe Jena's parsers ever close the stream for you) so after so many 
requests (10 I think by default) you are exhausting the max connections per 
route for the HTTP Client.  If that is the case wrapping the use of the stream 
in a try-with-resources block may be the solution.

Rob

On 05/07/2021, 14:03, "Martynas Jusevičius" <[email protected]> wrote:

    HTTPClient is not running out of connections? It is known to hang in such 
cases.

    On Mon, Jul 5, 2021 at 2:58 PM james anderson <[email protected]> wrote:
    >
    > good afternoon;
    >
    > > On 2021-07-05, at 12:36:20, Andy Seaborne <[email protected]> wrote:
    > >
    > >
    > >
    > > On 05/07/2021 10:01, Ivan Lagunov wrote:
    > >> Hello,
    > >> We’re facing an issue with Jena reading n-triples stream over HTTP. In 
fact, our application hangs entirely while executing this piece of code:
    > >> Model sub = ModelFactory.createDefaultModel();
    > >> TypedInputStream stream = HttpOp.execHttpGet(requestURL, 
WebContent.contentTypeNTriples, createHttpClient(auth), null)
    > >> // The following part sometimes hangs:
    > >> RDFParser.create()
    > >>         .source(stream)
    > >>         .lang(Lang.NTRIPLES)
    > >>         .errorHandler(ErrorHandlerFactory.errorHandlerStrict)
    > >>         .parse(sub.getGraph());
    > >> // This point is not reached
    > > >
    > >> The issue is not persistent, moreover it happens infrequently.
    > >
    > > Then it looks like the data has stopped arriving but the connection is 
still open. (or the system has gone in GC overload due to heap pressure.)
    > >
    > > Is intermittent on the same data? Or is the data changing? because 
maybe the data can't be written properly and the sender stops sending, though 
I'd expect the sender to close the connection (it's now in an unknown state and 
can't be reused).
    > >
    > >> When it occurs, the RDF store server (we use Dydra for that) logs a 
successful HTTP 200 response for our call (truncated for readability):
    > >> HTTP/1.1" 200 3072/55397664 10.676/10.828 "application/n-triples" "-" 
"-" "Apache-Jena-ARQ/3.17.0" "127.0.0.1:8104"
    >
    > the situation involves an nginx proxy and an upstream sparql processor.
    >
    > >
    > > What do the fields mean?
    >
    > the line is an excerpt from an entry from the nginx request log. that 
line contains:
    >
    >   protocol  code  requestLength/responseLength  
upstreamElapsedTime/clientElapsedTIme  acceptType  -  -  clientAgemt  
upstreamPort
    >
    > >
    > > Is that 3072 bytes sent (so far) of 55397664?
    > >
    > > If so, is Content-Length set (and then chunk encoding isn't needed).
    >
    > likely not, as the response is (i believe) that from a sparql request, 
which is emitted as it is generated.
    >
    > >
    > > Unfortunately, in HTTP, 200 really means "I started to send stuff", not 
"I completed sending stuff". There is no way in HTTP 1/1 to signal an error 
after starting the response.
    >
    > that is true, but there are indications in other logs which imply that 
the sparql processor believes the response to have been completely sent to 
nginx.
    > there are several reasons to believe this.
    > the times and the 200 response code in the nginx log indicate completion.
    > otherwise, it would either indicate that it timed out, or would include a 
499 code, to the effect that the client closed the connection before the 
response was sent.
    > neither is the case.
    > in addition, the elapsed time is well below that for which nginx would 
time out an upstream connection.
    >
    > >
    > > The HttpClient - how is it configured?
    > >
    > >> So it looks like the RDF store successfully executes the SPARQL query, 
responds with HTTP 200 and starts transferring the data with the chunked 
encoding. Then something goes wrong when Jena processes the input stream. I 
expect there might be some timeout behind the scenes while Jena reads the stream
    > >
    > > Does any data reach the graph?
    > >
    > > There is no timeout at the client end - otherwise you would get an 
exception. The parser is reading the input stream from Apache HttpClient. If it 
hangs, it's because the data has stopped arriving but the connection is still 
open.
    > >
    > > You could try replacing .parse(graph) with parse(StreamRDF) and plug in 
a logging StreamRDF so you can see the progress, either sending on data to the 
graph or for investigation, merely logging.
    > >
    > > In HTTP 1.1, a streamed response requires chunk encoding only when the 
Content-Length isn't given.
    >
    > i believe, the content length is not given.
    >
    > >
    > > >
    > > , and it causes it to wait indefinitely. At the same time 
ErrorHandlerFactory.errorHandlerStrict does not help at all – no errors are 
logged.
    > >> Is there a way to configure the timeout behavior for the underlying 
Jena logic of processing HTTP stream? Ideally we want to abort the request if 
it times out and then retry it a few times until it succeeds.
    > >
    > > The HttpClient determines the transfer.
    > >
    > >    Andy
    > >
    > > FYI: RDFConnectionRemote is an abstraction to make this a little 
easier. No need to go to the low-level HttpOp.
    > >
    > >
    > > FYI: Jena 4.mumble.0 is likely to change to using jena.net.http as the 
HTTP code. There has to be some change anyway to get HTTP/2  (Apache HttpClient 
v5+, not v4, has HTTP/2 support).
    > >
    > > This will include a new Graph Store Protocol client.
    > >
    > >> Met vriendelijke groet, with kind regards,
    > >> Ivan Lagunov
    > >> Technical Lead / Software Architect
    > >> Skype: lagivan
    > >> Semaku B.V.
    > >> Torenallee 20 (SFJ3D) • 5617 BC Eindhoven • www.semaku.com
    >




Reply via email to