Re: wrong content-types in s-get | Re: Export named graph from TDB to several ntriples files

Andy Seaborne Thu, 31 Jan 2019 06:23:53 -0800

Jena has it's own content negotiation mechanism - I couldn't find anexisting one at the time and it has turned out to be "quite complicated"for linked data as control of the defaults and choices when not an exactmatch is important.


So we have control of the corner cases and defaults.


Internally in Fuseki:

"application/json" isn't registered for graphs or datasets.

There are two related registrations:

"application/rdf+json"
"application/ld+json"

Fuskei doesn't do an "+" processing.

Fuseki could default "application/json" to "application/ld+json".


curl without a header sends:

"Accept: */*"

Fuseki chooses the first it is internal list of choices.

It is not the same as sending no "Accept" when Fuseki chooses a defaultalthough none and */* give the server free choice of return.


curl -v -g 'http://localhost:3030/ds?query=ASK{}'
curl -v -g --header 'Accept:' 'http://localhost:3030/ds?query=ASK{}'

Content negotiation is quite sensitive to client setup and, well, someHTTP clients hate and don't set conneg then can't handle the results.


Some servers don't have content type setup.

On the client side, Jena pokes about in the file name to use theextension if all else fails.


    Andy

On 31/01/2019 13:27, vincent ventresque wrote:

Sorry, let me sum up the previous messages :

1) I wanted to export a named graph from tdb to ntriples

2) Andy advised to modify s-get, which I did
3) when modifying s-get, I noticed there were 2 wrong content-types :application/json & application/n-quads ; both give rdf-xml output
4) Andy suggested it came from s-get settings
5) I showed that commenting the settings in s-get have no effect ANDthat the problem is the same with curl.
6) my purpose is also to understand how all this stuff works!


Le 31/01/2019 à 14:22, Martynas Jusevičius a écrit :
Vincent,

can you start by explaining what you are trying to do and why, rather
describing how you're doing it?

On Thu, Jan 31, 2019 at 2:20 PM vincent ventresque
<vincent.ventres...@ens-lyon.fr> wrote:
Sorry, I should have explained more clearly : the previous messages
where about default settings in s-get, and when creating a new function
to handle --output option, I noticed there was a wrong content-type in
s-get for plain json (see my s-get file here :
https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download).
My purpose was to demonstrate that the problem isn't linked to s-get,
since it's the same with curl. Besides, I noticed the same problem with
n-quads.

curl --header 'Accept: application/n-quads'
'http://localhost:3030/test_tdb2?graph=http://test'
<rdf:RDF
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
      xmlns:j.0="http://example.org/"; >
    <rdf:Description rdf:about="http://example.org/titi";>
      <j.0:tata>coucou</j.0:tata>
    </rdf:Description>
</rdf:RDF>



Le 31/01/2019 à 14:12, ajs6f a écrit :
I'm not sure what you expect to get back from Fuseki with an"application/json" mimetype? There is no W3C-spec plain-JSON RDFserialization that I know of. I suppose there's the old Tallis idea:
https://jena.apache.org/documentation/io/rdf-json.html

but I can't imagine that's what you're looking for.

ajs6f
On Jan 31, 2019, at 8:09 AM, vincent ventresque<vincent.ventres...@ens-lyon.fr> wrote:
It seems that the problem is completely independent from s-get (seethese results with curl below). So I think there's a defaultsetting somewhere in Fuseki itself.
#~~~~~~~  --header 'Accept: application/json' ~~~~~~~~~~~~~~~~~~~~~
:~/Documents/fuseki/bin$ curl --header 'Accept: application/json''http://localhost:3030/test_tdb2?graph=http://test'
<rdf:RDF
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
      xmlns:j.0="http://example.org/"; >
    <rdf:Description rdf:about="http://example.org/titi";>
      <j.0:tata>coucou</j.0:tata>
    </rdf:Description>
</rdf:RDF>
#~~~~~~~ --header 'Accept: application/rdf+json'~~~~~~~~~~~~~~~~~~~~~~
:~/Documents/fuseki/bin$ curl --header 'Accept:application/rdf+json''http://localhost:3030/test_tdb2?graph=http://test'
{
    "http://example.org/titi"; : {
      "http://example.org/tata"; : [ {
        "type" : "literal" ,
        "value" : "coucou"
      }
       ]
    }
}



Le 31/01/2019 à 12:58, vincent ventresque a écrit :
Thanks for your quick reply!
$mtAppJSON isn't used.
I think my previous msg wasn't clear : I meant raw json and notjson-ld (my code works now for both, and I use $mtAppJSON ; but Ihad to replace 'application/json' with 'application/rdf+json' inorder to get json instead of XML ; see the file herehttps://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download)
The settings are: ...
I made a little test : comment these lines and the "names" part,and you'll get XML!
Le 31/01/2019 à 12:48, Andy Seaborne a écrit :
On 31/01/2019 11:26, vincent ventresque wrote:
Hello,
I found the origin of the problem for json : the $mtAppJSON hadthe value
'application/json'
$mtAppJSON isn't used.

"application/rdf+json"
isn't JSON-LD (it's the old Talis format).

There is:

$mtJSONLD           = 'application/ld+json'
it has to be replaced with

'application/rdf+json'

I've updated the file here :
https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download
Maybe I'm going to submit a pull request as Andy suggested, butI'd like to understand why 'application/json' returns xml.Besides, it's the same thing for nquads : I tried to replace
$mtNQuads = 'application/n-quads'

with

$mtNQuads = 'application/x-trig'

but still have xml...
The settings are:

# Default for GET
# At least allow anything (and hope!)
$accept_rdf="#{$mtTurtle} , #{$mtNTriples};q=0.9 ,#{$mtRDF};q=0.8 , #{$mtJSONLD};q=0.5"
# Datasets
$accept_ds="#{$mtTrig} , #{$mtNQuads};q=0.9 , #{$mtJSONLD};q=0.5"
# For SPARQL query
$accept_results="#{$mtSparqlResultsJ} ,#{$mtSparqlResultsX};q=0.9 , #{$accept_rdf}"
# Accept any in case of trouble.
$accept_rdf="#{$accept_rdf} , */*;q=0.1"
$accept_results="#{$accept_results} , */*;q=0.1"
Is there a kind of default setting somewhere (if content-typeisn't recognized in Fuseki, the response is xml) ?
Yes.

RDF/XML for graphs, N-Quads for datasets.
Run Fuseki/full with "-v" and it should print the contentnegotiation details.
      Andy
Thanks in advance

VV


Ok, maybe I'm going to submit a pull request, but I'd

Le 29/01/2019 à 17:11, vincent ventresque a écrit :
Hi Andy,
Thanks again for your idea to modify the s-get script, ithelped me understand ruby utilities and http requests (I oftenuse the ruby scripts but never really looked inside).
Don't know how to submit a pull request, and I'm not a rubyexpert! Therefore I've put a small test file here :
https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download
-- added "--output" in options + created a new function(set_output_format)
-- it works for ntriples, xml, Json-LD,

-- doesn't work for json (returns xml...)
N.B. : in this test file, I've removed large parts of theoriginal code in order to improve readability
Le 28/01/2019 à 15:28, Vincent Ventresque a écrit :
Hi Andy,
Many thanks for these ideas, I'm going to try the curl & riotsolutions.
Modify the s-get script to handle --output and set the"Accept:" header then please submit a pull request for thechanges
I had made an attempt to modify the s-get script in the sameway as for s-query but it didn't work : if I have a momentI'll try to understand how the options are handled.
Le 28/01/2019 à 14:19, Andy Seaborne a écrit :
On 28/01/2019 11:04, Vincent Ventresque wrote:
Hello,
I want to export a named graph which is stored in a TDBdataset, and I want to store the output in several files(for the named graph contains +/- 9.5 M triples).
My idea is to use "split" command in order to cut the outputof the export into pieces. However, this solution with"split" requires ntriples or nquads (one triple per line, sothat the files are not cut in the middle of an assertion ;besides, it's also more practical to have a triple per lineif I want to transform the data with perl or sed).
I found a solution with s-query but had to edit the rubys-query script to get ntriples (see below).
There are other possible solutions for an export viacommand-line utilities : "s-get" and "tdbdump". If Iunderstand well, "tdbdump" gives nquads as output, but onecan't export only a part of the data, everything is exportedat once. The "s-get" solution allows to select a named graphin the dataset, but I couldn't change the output format.
Are there better solutions to get an export in several files?
Ways I can think of:
1/ Modify the s-get script to handle --output and set the"Accept:" header then please submit a pull request for thechanges.
2/ Use curl

curl --header 'Accept: application/n-triples' \
     'http://localhost:3030/ds?graph=http://bnf_titres'

3/ Parse the s-get output:

s-get ... | riot --syntax TTL

      Andy
Thanks in advance,

VV.



~~~~~~~~~~~ 1) SOLUTION WITH s-query ~~~~~~~~~~~~~~~~~~~~~

1.1) Edit s-query ruby script (add nt)

-- l. 572 : when  "json","xml","text","csv","tsv","nt"
-- l. 574 : when :json,:xml,:text,:csv,:tsv,:nt
-- l. 515 : opts.on('--output=TYPE',[:json,:xml,:text,:csv,:tsv,:nt],-- l. 519 : opts.on('--accept=TYPE',[:json,:xml,:text,:csv,:tsv,:nt],
1.2) Command
/my/path/to/fuseki/bin/s-query--service=http://localhost:3030/BnF_text_v2/ "construct { ?s?p ?o } where { graph <http://bnf_titres> { ?s ?p ?o }}"--output=nt | split -l 500000 - --additional-suffix=.ntBnfTextTitres-
~~~~~~~~~~~ 2) SOLUTION WITH tdbdump (nquads but no namedgraph) ~~~~~~~~~~~~~~~~~~~~~
/my/path/to/jena/bin/tdbdump--loc=/my/path/to/fuseki/run/databases/BnF_text_v2--graph=http://bnf_titres | split -l 500000 ---additional-suffix=.nt BnfTextTitres-
=> Unknown argument: graph
~~~~~~~~~~~ 3) SOLUTION WITH s-get (named graph ok, butturtle output) ~~~~~~~~~~~~~~~~~~~~~
/my/path/to/fuseki/bin/s-gethttp://localhost:3030/BnF_text_v2/data http://bnf_titres--output=text | split -l 500000 - --additional-suffix=.ntBnfTextTitres-
=> /my/path/to/fuseki/bin/s-get:364:in `cmd_soh': invalidoption: --output=text (OptionParser::InvalidOption)
from /my/path/to/fuseki/bin/fuseki/bin/s-get:715:in `<main>'

Re: wrong content-types in s-get | Re: Export named graph from TDB to several ntriples files

Reply via email to