On 28/12/12 07:42, Abhishek Shivkumar wrote:
Hi Andy,

   Here are the triples from the neighborhood of line 270608. i tried
finding the error but couldn't. Do you see any by chance?
I printed the line number too on the left just in case.  Ex: "line num
270591-"

Not quite the right line but close ... this may be the problem:

Line:
-----------------
ns:m.01gqn1 ns:base.braziliangovt.brazilian_political_party.number      13.
-----------------

and the problem is the   13.

The WG spec in development has:

[21]    DECIMAL         ::=     [+-]? [0-9]* '.' [0-9]+

so a decimal must have a trailing digit, and "13." is integer 13 followed by a DOT (terminates the triples).

But in the W3C submission has a know problem in this area:

[18] decimal ::= ('-' | '+')? ( [0-9]+ '.' [0-9]* | '.' ([0-9])+ | ([0-9])+ )

and 13. is ambiguous. Is it 13 and a DOT or a decimal with lexical form "13." The normal way to tokenize is to choose the longest match (so ":abc" isn't ":a" then "bc") and that means you need a space to the tokens '13' and DOT

Jena 2.7.4 follows the submission and "13." is a decimal and the needs a trailing DOT.

In fact, using space-DOT everywhere would be very sensible. Trailing dots on prefix names may confuse some older parsers.

Jena development (2.10.0) follows the W3C WG spec and it's 13 integer and a trailing DOT and parses.

Do you have a corrected version of freebase-rdf-2012-12-09-00-00? I downloaded it but there are other things to fix up before it gets to that point.

        Andy



Reply via email to