Andy,

I've switched the output to XML version 1.1 and started getting a lot
of inexplicable and seemingly random riot warnings, such as

18:49:18 WARN  riot                 :: [line: 181, col: 15] Bad IRI:
<ply to this email directly or view it on GitHub:&#xD;
htt035f94/> Spaces are not legal in URIs/IRIs.

where line 181 simply reads:

         
<uri>https://localhost/messages/65195ff1-3549-4840-8bc2-f37a3a035f94/</uri>

Those warnings were not there using XML 1.0, which concerns me. From
the warning message it looks like the parser somehow read part of one
term on top of another.

I am honestly trying to prepare a test file right away now :) I've cut
it down to ~350 lines, but if I remove a single extra triple or even a
line of string, the warning goes away.
Can I send it off-list to you?

On Mon, Jul 13, 2020 at 11:22 AM Martynas Jusevičius
<marty...@atomgraph.com> wrote:
>
> Thanks Andy. I was making an example when I got your message :)
>
> I've found that form feed is not allowed in XML 1.0 but allowed in XML 1.1
> https://stackoverflow.com/questions/15034302/how-can-i-add-form-feed-character-into-text-that-i-am-creating-with-xslt/37790009
>
> I tried TriX as XML version 1.1 and it worked:
>
> <?xml version="1.1" encoding="UTF-8"?>
> <trix xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
>       xmlns="http://www.w3.org/2004/03/trix/trix-1/";
>       xsi:schemaLocation="http://www.w3.org/2004/03/trix/trix-1/ 
> trix-1.0.xsd">
>    <graph>
>      <triple>
>        <uri>http://example.org/Bob</uri>
>        <uri>http://example.org/name</uri>
>        <plainLiteral>Bob&#xc;</plainLiteral>
>      </triple>
>    </graph>
> </trix>
>
> Output:
>
> <http://example.org/Bob> <http://example.org/name> "Bob\f" .
>
> I guess I need to figure out how to get Saxon to produce 1.1.
>
> On Mon, Jul 13, 2020 at 11:14 AM Andy Seaborne <a...@apache.org> wrote:
> >
> > Small example?
> > Try with and without &#xc;?
> >
> > <TriX xmlns="http://www.w3.org/2004/03/trix/trix-1/";>
> >    <graph>
> >      <triple>
> >        <uri>http://example.org/Bob</uri>
> >        <uri>http://example.org/name</uri>
> >        <plainLiteral>Bob&#xc;</plainLiteral>
> >      </triple>
> >    </graph>
> > </TriX>
> >
> > 10:10:19 ERROR riot            :: [line: 6, col: 29] XML error:
> > ParseError at [row,col]:[6,29]
> > Message: Character reference "&#xc" is an invalid XML character.
> >
> > The "Message:" line isn't from Jena.
> >
> > ReaderTriX.java
> >
> >          } catch (XMLStreamException ex) {
> >              staxError(parser.getLocation(), "XML error:
> > "+ex.getMessage()) ;
> >          }
> >
> >
> > (Jena 3.16.0ish) with JDK XML parser)
> >
> >      Andy
> >
> >
> > On 12/07/2020 23:01, Martynas Jusevičius wrote:
> > > Hi,
> > >
> > >      riot --strict --stop --syntax=TriX --output=nq
> > >
> > > gives me
> > >
> > > 21:40:07 ERROR riot                 :: [line: 2943360, col: 62] XML
> > > error: ParseError at [row,col]:[2943360,62]
> > >
> > > That line is in a <plainLiteral>and looks like this:
> > >
> > > - http://sprout.ics.uci.edu/past_projects/gac/index.html&#xc;&#xD;
> > >
> > > I'm guessing it's the &#xc; entity that riot is failing on? It's the Form 
> > > Feed:
> > > https://www.codetable.net/hex/c
> > > &#xD; is found on other (previous) lines so it shouldn't be it.
> > >
> > > Is &#xc; entity not allowed? The TriX output was produced by Saxon.
> > >
> > > JENA_VERSION=3.10.0
> > >

Reply via email to