good afternoon;

> On 2021-07-05, at 12:36:20, Andy Seaborne <[email protected]> wrote:
> 
> 
> 
> On 05/07/2021 10:01, Ivan Lagunov wrote:
>> Hello,
>> We’re facing an issue with Jena reading n-triples stream over HTTP. In fact, 
>> our application hangs entirely while executing this piece of code:
>> Model sub = ModelFactory.createDefaultModel();
>> TypedInputStream stream = HttpOp.execHttpGet(requestURL, 
>> WebContent.contentTypeNTriples, createHttpClient(auth), null)
>> // The following part sometimes hangs:
>> RDFParser.create()
>>         .source(stream)
>>         .lang(Lang.NTRIPLES)
>>         .errorHandler(ErrorHandlerFactory.errorHandlerStrict)
>>         .parse(sub.getGraph());
>> // This point is not reached
> >
>> The issue is not persistent, moreover it happens infrequently.
> 
> Then it looks like the data has stopped arriving but the connection is still 
> open. (or the system has gone in GC overload due to heap pressure.)
> 
> Is intermittent on the same data? Or is the data changing? because maybe the 
> data can't be written properly and the sender stops sending, though I'd 
> expect the sender to close the connection (it's now in an unknown state and 
> can't be reused).
> 
>> When it occurs, the RDF store server (we use Dydra for that) logs a 
>> successful HTTP 200 response for our call (truncated for readability):
>> HTTP/1.1" 200 3072/55397664 10.676/10.828 "application/n-triples" "-" "-" 
>> "Apache-Jena-ARQ/3.17.0" "127.0.0.1:8104"

the situation involves an nginx proxy and an upstream sparql processor.

> 
> What do the fields mean?

the line is an excerpt from an entry from the nginx request log. that line 
contains:

  protocol  code  requestLength/responseLength  
upstreamElapsedTime/clientElapsedTIme  acceptType  -  -  clientAgemt  
upstreamPort

> 
> Is that 3072 bytes sent (so far) of 55397664?
> 
> If so, is Content-Length set (and then chunk encoding isn't needed).

likely not, as the response is (i believe) that from a sparql request, which is 
emitted as it is generated.

> 
> Unfortunately, in HTTP, 200 really means "I started to send stuff", not "I 
> completed sending stuff". There is no way in HTTP 1/1 to signal an error 
> after starting the response.

that is true, but there are indications in other logs which imply that the 
sparql processor believes the response to have been completely sent to nginx.
there are several reasons to believe this.
the times and the 200 response code in the nginx log indicate completion.
otherwise, it would either indicate that it timed out, or would include a 499 
code, to the effect that the client closed the connection before the response 
was sent.
neither is the case.
in addition, the elapsed time is well below that for which nginx would time out 
an upstream connection.

> 
> The HttpClient - how is it configured?
> 
>> So it looks like the RDF store successfully executes the SPARQL query, 
>> responds with HTTP 200 and starts transferring the data with the chunked 
>> encoding. Then something goes wrong when Jena processes the input stream. I 
>> expect there might be some timeout behind the scenes while Jena reads the 
>> stream
> 
> Does any data reach the graph?
> 
> There is no timeout at the client end - otherwise you would get an exception. 
> The parser is reading the input stream from Apache HttpClient. If it hangs, 
> it's because the data has stopped arriving but the connection is still open.
> 
> You could try replacing .parse(graph) with parse(StreamRDF) and plug in a 
> logging StreamRDF so you can see the progress, either sending on data to the 
> graph or for investigation, merely logging.
> 
> In HTTP 1.1, a streamed response requires chunk encoding only when the 
> Content-Length isn't given.

i believe, the content length is not given.

> 
> >
> , and it causes it to wait indefinitely. At the same time 
> ErrorHandlerFactory.errorHandlerStrict does not help at all – no errors are 
> logged.
>> Is there a way to configure the timeout behavior for the underlying Jena 
>> logic of processing HTTP stream? Ideally we want to abort the request if it 
>> times out and then retry it a few times until it succeeds.
> 
> The HttpClient determines the transfer.
> 
>    Andy
> 
> FYI: RDFConnectionRemote is an abstraction to make this a little easier. No 
> need to go to the low-level HttpOp.
> 
> 
> FYI: Jena 4.mumble.0 is likely to change to using jena.net.http as the HTTP 
> code. There has to be some change anyway to get HTTP/2  (Apache HttpClient 
> v5+, not v4, has HTTP/2 support).
> 
> This will include a new Graph Store Protocol client.
> 
>> Met vriendelijke groet, with kind regards,
>> Ivan Lagunov
>> Technical Lead / Software Architect
>> Skype: lagivan
>> Semaku B.V.
>> Torenallee 20 (SFJ3D) • 5617 BC Eindhoven • www.semaku.com

Reply via email to