On 13/07/2020 20:07, Martynas Jusevičius wrote:
Andy,

I've switched the output to XML version 1.1 and started getting a lot
of inexplicable and seemingly random riot warnings, such as

18:49:18 WARN  riot                 :: [line: 181, col: 15] Bad IRI:
<ply to this email directly or view it on GitHub:&#xD;
htt035f94/> Spaces are not legal in URIs/IRIs.

where line 181 simply reads:

          
<uri>https://localhost/messages/65195ff1-3549-4840-8bc2-f37a3a035f94/</uri>

Those warnings were not there using XML 1.0, which concerns me. From
the warning message it looks like the parser somehow read part of one
term on top of another.

TriX processes the output of the XML parser.

org.apache.jena.riot.lang.ReaderTriX

                String x = parser.getElementText() ;
                Node n = profile.createURI(x, line, col) ;

I am honestly trying to prepare a test file right away now :) I've cut
it down to ~350 lines, but if I remove a single extra triple or even a
line of string, the warning goes away.
Can I send it off-list to you?

350 lines is still long.

Because if I accept any off-list, I get a too much off-list, I don't work that way.

    Andy


On Mon, Jul 13, 2020 at 11:22 AM Martynas Jusevičius
<marty...@atomgraph.com> wrote:

Thanks Andy. I was making an example when I got your message :)

I've found that form feed is not allowed in XML 1.0 but allowed in XML 1.1
https://stackoverflow.com/questions/15034302/how-can-i-add-form-feed-character-into-text-that-i-am-creating-with-xslt/37790009

I tried TriX as XML version 1.1 and it worked:

<?xml version="1.1" encoding="UTF-8"?>
<trix xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
       xmlns="http://www.w3.org/2004/03/trix/trix-1/";
       xsi:schemaLocation="http://www.w3.org/2004/03/trix/trix-1/ trix-1.0.xsd">
    <graph>
      <triple>
        <uri>http://example.org/Bob</uri>
        <uri>http://example.org/name</uri>
        <plainLiteral>Bob&#xc;</plainLiteral>
      </triple>
    </graph>
</trix>

Output:

<http://example.org/Bob> <http://example.org/name> "Bob\f" .

I guess I need to figure out how to get Saxon to produce 1.1.

On Mon, Jul 13, 2020 at 11:14 AM Andy Seaborne <a...@apache.org> wrote:

Small example?
Try with and without &#xc;?

<TriX xmlns="http://www.w3.org/2004/03/trix/trix-1/";>
    <graph>
      <triple>
        <uri>http://example.org/Bob</uri>
        <uri>http://example.org/name</uri>
        <plainLiteral>Bob&#xc;</plainLiteral>
      </triple>
    </graph>
</TriX>

10:10:19 ERROR riot            :: [line: 6, col: 29] XML error:
ParseError at [row,col]:[6,29]
Message: Character reference "&#xc" is an invalid XML character.

The "Message:" line isn't from Jena.

ReaderTriX.java

          } catch (XMLStreamException ex) {
              staxError(parser.getLocation(), "XML error:
"+ex.getMessage()) ;
          }


(Jena 3.16.0ish) with JDK XML parser)

      Andy


On 12/07/2020 23:01, Martynas Jusevičius wrote:
Hi,

      riot --strict --stop --syntax=TriX --output=nq

gives me

21:40:07 ERROR riot                 :: [line: 2943360, col: 62] XML
error: ParseError at [row,col]:[2943360,62]

That line is in a <plainLiteral>and looks like this:

- http://sprout.ics.uci.edu/past_projects/gac/index.html&#xc;&#xD;

I'm guessing it's the &#xc; entity that riot is failing on? It's the Form Feed:
https://www.codetable.net/hex/c
&#xD; is found on other (previous) lines so it shouldn't be it.

Is &#xc; entity not allowed? The TriX output was produced by Saxon.

JENA_VERSION=3.10.0

Reply via email to