Hi there,
I've switched back SPARQL Graph Store protocol GET to use plain RDF/XML.
Details:
The default when using RIOT to write in Lang.RDFXML is to use the pretty
form. i.e. when using RDFDataMgr.write(model,Lang.RDFXML). RIOT I/O is
not automatically used if available.
Fuseki uses new style RDFDataMgr, not model.write so got affetced by the
change.
Writing model.write() isn't affected.
Yes - RDF/XML-ABBREV is expensive. I'm not completely sure why - the
Turtle writer is doing a similar, but not identical, analysis of the
model before writing. However, the RDF/XML-ABBREV writer has more
choices and more options to consider.
>> is anyone really using
>> RDF/XML anymore as a human-readable format anyway?
Absolutely!
But, today, it's the standard. Tomorrow, it won't be the only choice
and I'm guessing that Turtle-only toolkits will emerge.
Next ...
DatasetAccessor:
It does not seem to be setting the accept header at all so it gets the
default. Which is application/rdf+xml.
I've recorded the need to set the accept header to a list based on
efficiency as:
https://issues.apache.org/jira/browse/JENA-481
I thinking the order should be N-triples, Turtle, RDF/XML, "whatever you
can give me".
For reference, the accept string for reading RDF with
RDFDataMgr.loadModel(URL) or model.read(URL) is currently:
text/turtle,application/rdf+xml;q=0.9,application/xml;q=0.8,*/*;q=0.5;
Maybe that should include "application/n-triples" - including the
original MIME type of text/plain is distinctly unhelpful.
Andy
On 27/06/13 19:56, Rob Vesse wrote:
Andy can probably give you a definitive answer here
I know that there were significant improvements to the RDF output
infrastructure made in 2.10.1 so my guess is that somehow the default
RDF/XML output got switched as part of this upgrade (not necessarily
intentionally).
If this is the case Andy can likely make the fix easily, I however don't
know where to look for this setting.
Rob
On 6/27/13 11:38 AM, "Elli Schwarz" <[email protected]> wrote:
I think I may have tracked down what is causing my slow performance of
GET with the new Fuseki 0.28 snapshot. Comparing the output of s-get for
the same data from the latest Fuseki 0.28 snapshot, and from the 0.26
release, I discovered that the 0.28 snapshot is creating the XML in
hierarchical form, with nesting of elements (RDF/XML-ABBREV). In Fuseki
0.26, it would output the RDF in the regular flattened RDF/XML format.
Obviously, creating the flattened form is much more efficient.
While I understand that RDF/XML-ABBREV is more human readable, there's a
big price to pay in efficiency, at least for my data. In my case, I'm
accessing my Fuseki endpoint via datasetAccessor.getModel(), and as far
as I know, there's no way for me to tell Fuseki through this API that I
want the data to be serialized as N-TRIPLES (since it's just going to be
loaded in a Jena model anyway and not read by a human). Is there a way I
can control how Fuseki serializes by default? And why was the default
serialization format changed to RDF/XML-ABBREV - is anyone really using
RDF/XML anymore as a human-readable format anyway? ;-)
I really appreciate any advice, workarounds, or fixes for this issue. I
can't really switch back to the earlier Fuseki versions anymore, since
the new jena-text makes my life so much easier since I no longer have to
worry about manually reindexing after SPARQL Update, like I did with
Fuseki and LARQ. Thanks for incorporating jena-text!
Thank you,
Elli
________________________________
From: Elli Schwarz <[email protected]>
To: "[email protected]" <[email protected]>
Sent: Wednesday, June 26, 2013 9:48 AM
Subject: Problem with Fuseki generating RDF/XML
Rob,
(This email previously had the subject JENA-378 Redux)
I think I tracked down the problem with getModel() a bit more. Using
s-get, I can get data back as TTL immediately:
./s-get http://localhost:3131/ds/data http://192.168.6.37/graph/uri_data
If I modify the s-get script to get results as RDF/XML, then it takes
several minutes for Fuseki 0.28-SNAPSHOT to respond.
I start Fuseki 0.28 with this command (Fuseki 0.26 is started similarly,
but with the config-tdb.ttl assembler):
/usr/bin/java -Dlog4j.configuration=log4j.properties -Xmx3200M -jar
/opt/jena-2.10/jena-fuseki-0.2.8-SNAPSHOT/fuseki-server.jar --update
--config=config-tdb-text.ttl --port=3131
If I point the same modified s-get script to the Fuseki 0.26 release,
the RDF/XML comes back immediately. My guess is that the
DatasetAccessorFactory.createHTTP("http://localhost:3131/ds/data").getMod
el(modelName) command I use gets data back as RDF/XML, and for some
reason Fuseki 0.28 takes a long time to generate RDF/XML. Any ideas as
to what changed in the latest version of Fuseki that would cause this
problem? Is there any way I can set Fuseki (or the client
DatasetAccessor) to use TTL serialization?
(BTW, I created JENA-479 for the other bug I discovered with SPARQL
Insert scripts.)
Thank you very much for your help,
Elli
________________________________
From: Rob Vesse <[email protected]>
To: "[email protected]" <[email protected]>; Elli Schwarz
<[email protected]>
Sent: Tuesday, June 25, 2013 4:40 PM
Subject: Re: JENA-378 Redux
I use the older stable jena-core and jena-arq 2.10.0 and jena-fuseki
0.2.6
The current stable releases are jena-core and jena-arq 2.10.1 and
jena-fuseki 0.2.7
Do you experience the problem with those versions?
Fuseki config file or arguments used to start would be useful.
Rob
On 6/25/13 1:35 PM, "Elli Schwarz" <[email protected]> wrote:
This past January, I reported a bug to this list which was recorded as
JENA-378. I'm now experiencing what appears to be the same problem,
where
[ ] syntax in an Insert script doesn't work when using
UpdateExecutionFactory:
String updateString = "INSERT {} WHERE { ?x ?p [ ?a ?b ] }";
UpdateRequest update = UpdateFactory.create(updateString);
UpdateProcessor up = UpdateExecutionFactory.createRemote(update,
"http://localhost:3131/ds/update");
up.execute();
The error is: 400 Encountered " "?" "? ""
caused by the client generating incorrect SPARQL with an extra ? (as
viewed from the Fuseki log): INSERT { } WHERE { ?x ?p ??0 . ??0 ?a
?b
}
This is with jena-core & jena-arg 2.10.2-SNAPSHOT, and with
jena-fuseki
0.2.8-SNAPSHOT (compiled today).
--
Another problem I'm having which I can't track down is that the
following
code takes a VERY long time to execute (10 minutes):
DatasetAccessorFactory.createHTTP("http://localhost:3131/ds/update").ge
tMo
del(modelName);
With earlier versions of Fuseki, it would take seconds, with the same
data. The problem seems to be related to my Fuseki server instance
itself, which is 0.2.8-SNAPSHOT (r1496513), and not to my client code,
since even if I use the older stable jena-core and jena-arq 2.10.0 and
jena-fuseki 0.2.6, I also have the problem (but not if I connect it to
an
earlier Fuseki release). Upon debugging, it appears that for some
reason
the HTTP request itself is taking a long time to complete. In fact, I'm
not even getting anything in the Fuseki log for about a minute after
the
request is made, but once the request is made I immediately see a spike
in CPU usage on the server. This doesn't appear to be a network latency
issue since other access to the server isn't affected, it appears to be
just this call. It would seem that Fuseki is spinning its wheels on
something.
I realize this may not be enough info for you to determine what is
causing the problem, but I don't know how else to track down the issue.
Using s-get I can get back the data quickly, which is strange since I
though it would be doing the same thing as the getModel().
Thank you,
Elli