Re: Problem with Fuseki generating RDF/XML

Andy Seaborne Fri, 28 Jun 2013 03:19:00 -0700

Hi there,

I've switched back SPARQL Graph Store protocol GET to use plain RDF/XML.


Details:

The default when using RIOT to write in Lang.RDFXML is to use the prettyform. i.e. when using RDFDataMgr.write(model,Lang.RDFXML). RIOT I/O isnot automatically used if available.

Fuseki uses new style RDFDataMgr, not model.write so got affetced by thechange.


Writing model.write() isn't affected.

Yes - RDF/XML-ABBREV is expensive. I'm not completely sure why - theTurtle writer is doing a similar, but not identical, analysis of themodel before writing. However, the RDF/XML-ABBREV writer has morechoices and more options to consider.


>> is anyone really using
>> RDF/XML anymore as a human-readable format anyway?

Absolutely!

But, today, it's the standard. Tomorrow, it won't be the only choiceand I'm guessing that Turtle-only toolkits will emerge.


Next ...

DatasetAccessor:

It does not seem to be setting the accept header at all so it gets thedefault. Which is application/rdf+xml.

I've recorded the need to set the accept header to a list based onefficiency as:


https://issues.apache.org/jira/browse/JENA-481

I thinking the order should be N-triples, Turtle, RDF/XML, "whatever youcan give me".

For reference, the accept string for reading RDF withRDFDataMgr.loadModel(URL) or model.read(URL) is currently:


text/turtle,application/rdf+xml;q=0.9,application/xml;q=0.8,*/*;q=0.5;

Maybe that should include "application/n-triples" - including theoriginal MIME type of text/plain is distinctly unhelpful.


        Andy


On 27/06/13 19:56, Rob Vesse wrote:

Andy can probably give you a definitive answer here

I know that there were significant improvements to the RDF output
infrastructure made in 2.10.1 so my guess is that somehow the default
RDF/XML output got switched as part of this upgrade (not necessarily
intentionally).

If this is the case Andy can likely make the fix easily, I however don't
know where to look for this setting.

Rob


On 6/27/13 11:38 AM, "Elli Schwarz" <[email protected]> wrote:

I think I may have tracked down what is causing my slow performance of
GET with the new Fuseki 0.28 snapshot. Comparing the output of s-get for
the same data from the latest Fuseki 0.28 snapshot, and from the 0.26
release, I discovered that the 0.28 snapshot is creating the XML in
hierarchical form, with nesting of elements (RDF/XML-ABBREV). In Fuseki
0.26, it would output the RDF in the regular flattened RDF/XML format.
Obviously, creating the flattened form is much more efficient.

While I understand that RDF/XML-ABBREV is more human readable, there's a
big price to pay in efficiency, at least for my data. In my case, I'm
accessing my Fuseki endpoint via datasetAccessor.getModel(), and as far
as I know, there's no way for me to tell Fuseki through this API that I
want the data to be serialized as N-TRIPLES (since it's just going to be
loaded in a Jena model anyway and not read by a human). Is there a way I
can control how Fuseki serializes by default? And why was the default
serialization format changed to RDF/XML-ABBREV - is anyone really using
RDF/XML anymore as a human-readable format anyway? ;-)

I really appreciate any advice, workarounds, or fixes for this issue. I
can't really switch back to the earlier Fuseki versions anymore, since
the new jena-text makes my life so much easier since I no longer have to
worry about manually reindexing after SPARQL Update, like I did with
Fuseki and LARQ. Thanks for incorporating jena-text!

Thank you,
Elli

________________________________
From: Elli Schwarz <[email protected]>
To: "[email protected]" <[email protected]>
Sent: Wednesday, June 26, 2013 9:48 AM
Subject: Problem with Fuseki generating RDF/XML

Rob,

(This email previously had the subject JENA-378 Redux)

I think I tracked down the problem with getModel() a bit more. Using
s-get, I can get data back as TTL immediately:
./s-get http://localhost:3131/ds/data http://192.168.6.37/graph/uri_data

If I modify the s-get script to get results as RDF/XML, then it takes
several minutes for Fuseki 0.28-SNAPSHOT to respond.

I start Fuseki 0.28 with this command (Fuseki 0.26 is started similarly,
but with the config-tdb.ttl assembler):
/usr/bin/java -Dlog4j.configuration=log4j.properties -Xmx3200M -jar
/opt/jena-2.10/jena-fuseki-0.2.8-SNAPSHOT/fuseki-server.jar --update
--config=config-tdb-text.ttl --port=3131

If I point the same modified s-get script to the Fuseki 0.26 release,
the RDF/XML comes back immediately. My guess is that the
DatasetAccessorFactory.createHTTP("http://localhost:3131/ds/data";).getMod
el(modelName) command I use gets data back as RDF/XML, and for some
reason Fuseki 0.28 takes a long time to generate RDF/XML. Any ideas as
to what changed in the latest version of Fuseki that would cause this
problem? Is there any way I can set Fuseki (or the client
DatasetAccessor) to use TTL serialization?

(BTW, I created JENA-479 for the other bug I discovered with SPARQL
Insert scripts.)

Thank you very much for your help,
Elli

________________________________
From: Rob Vesse <[email protected]>
To: "[email protected]" <[email protected]>; Elli Schwarz
<[email protected]>
Sent: Tuesday, June 25, 2013 4:40 PM
Subject: Re: JENA-378 Redux

I use the older stable jena-core and jena-arq 2.10.0 and jena-fuseki
0.2.6


The current stable releases are jena-core and jena-arq 2.10.1 and
jena-fuseki 0.2.7

Do you experience the problem with those versions?

Fuseki config file or arguments used to start would be useful.

Rob


On 6/25/13 1:35 PM, "Elli Schwarz" <[email protected]> wrote:

This past January, I reported a bug to this list which was recorded as
JENA-378. I'm now experiencing what appears to be the same problem,
where
[ ] syntax in an Insert script doesn't work when using
UpdateExecutionFactory:

  String updateString = "INSERT {} WHERE { ?x ?p [ ?a  ?b ] }";
  UpdateRequest update = UpdateFactory.create(updateString);

  UpdateProcessor up = UpdateExecutionFactory.createRemote(update,
      "http://localhost:3131/ds/update";);
  up.execute();

The error is: 400 Encountered " "?" "? ""
caused by the client generating incorrect SPARQL with an extra ? (as
viewed from the Fuseki log):  INSERT { } WHERE   { ?x ?p ??0 . ??0 ?a
?b
}

This is with jena-core & jena-arg  2.10.2-SNAPSHOT, and with
jena-fuseki
0.2.8-SNAPSHOT (compiled today).
--
Another problem I'm having which I can't track down is that the
following
code takes a VERY long time to execute (10 minutes):
DatasetAccessorFactory.createHTTP("http://localhost:3131/ds/update";).ge
tMo
del(modelName);

With earlier versions of Fuseki, it would take seconds, with the same
data. The problem seems to be related to my Fuseki server instance
itself, which is 0.2.8-SNAPSHOT (r1496513), and not to my client code,
since even if I use the older stable jena-core and jena-arq 2.10.0 and
jena-fuseki 0.2.6, I also have the problem (but not if I connect it to
an
earlier Fuseki release). Upon debugging, it appears that for some
reason
the HTTP request itself is taking a long time to complete. In fact, I'm
not even getting anything in the Fuseki log for about a minute after
the
request is made, but once the request is made I immediately see a spike
in CPU usage on the server. This doesn't appear to be a network latency
issue since other access to the server isn't affected, it appears to be
just this call. It would seem that Fuseki is spinning its wheels on
something.

I realize this may not be enough info for you to determine what is
causing the problem, but I don't know how else to track down the issue.
Using s-get I can get back the data quickly, which is strange since I
though it would be doing the same thing as the getModel().

Thank you,
Elli

Re: Problem with Fuseki generating RDF/XML

Reply via email to