On 28/12/12 07:42, Abhishek Shivkumar wrote:
Hi Andy,
Here are the triples from the neighborhood of line 270608. i tried
finding the error but couldn't. Do you see any by chance?
I printed the line number too on the left just in case. Ex: "line num
270591-"
Not quite the right line but close ... this may be the problem:
Line:
-----------------
ns:m.01gqn1 ns:base.braziliangovt.brazilian_political_party.number 13.
-----------------
and the problem is the 13.
The WG spec in development has:
[21] DECIMAL ::= [+-]? [0-9]* '.' [0-9]+
so a decimal must have a trailing digit, and "13." is integer 13
followed by a DOT (terminates the triples).
But in the W3C submission has a know problem in this area:
[18] decimal ::= ('-' | '+')? ( [0-9]+ '.' [0-9]* | '.' ([0-9])+ |
([0-9])+ )
and 13. is ambiguous. Is it 13 and a DOT or a decimal with lexical form
"13." The normal way to tokenize is to choose the longest match (so
":abc" isn't ":a" then "bc") and that means you need a space to the
tokens '13' and DOT
Jena 2.7.4 follows the submission and "13." is a decimal and the needs a
trailing DOT.
In fact, using space-DOT everywhere would be very sensible. Trailing
dots on prefix names may confuse some older parsers.
Jena development (2.10.0) follows the W3C WG spec and it's 13 integer
and a trailing DOT and parses.
Do you have a corrected version of freebase-rdf-2012-12-09-00-00? I
downloaded it but there are other things to fix up before it gets to
that point.
Andy