Thanks Andy. Awesome! So, I am downloading the latest dump of Freebase RDF -> freebase-rdf-2012-12-23-00-00.gz
Let me check with that and use tdbloader to see if it has been corrected. Also, when will JENA 2.10.0 with this correction, be released? Thank you! With Regards, Abhishek S On Fri, Dec 28, 2012 at 6:32 PM, Andy Seaborne <[email protected]> wrote: > On 28/12/12 07:42, Abhishek Shivkumar wrote: > >> Hi Andy, >> >> Here are the triples from the neighborhood of line 270608. i tried >> finding the error but couldn't. Do you see any by chance? >> I printed the line number too on the left just in case. Ex: "line num >> 270591-" >> > > Not quite the right line but close ... this may be the problem: > > Line: > ----------------- > > ns:m.01gqn1 ns:base.braziliangovt.**brazilian_political_party.**number > 13. > ----------------- > > and the problem is the 13. > > The WG spec in development has: > > [21] DECIMAL ::= [+-]? [0-9]* '.' [0-9]+ > > so a decimal must have a trailing digit, and "13." is integer 13 followed > by a DOT (terminates the triples). > > But in the W3C submission has a know problem in this area: > > [18] decimal ::= ('-' | '+')? ( [0-9]+ '.' [0-9]* | '.' > ([0-9])+ | ([0-9])+ ) > > and 13. is ambiguous. Is it 13 and a DOT or a decimal with lexical form > "13." The normal way to tokenize is to choose the longest match (so ":abc" > isn't ":a" then "bc") and that means you need a space to the tokens '13' > and DOT > > Jena 2.7.4 follows the submission and "13." is a decimal and the needs a > trailing DOT. > > In fact, using space-DOT everywhere would be very sensible. Trailing dots > on prefix names may confuse some older parsers. > > Jena development (2.10.0) follows the W3C WG spec and it's 13 integer and > a trailing DOT and parses. > > Do you have a corrected version of freebase-rdf-2012-12-09-00-00? I > downloaded it but there are other things to fix up before it gets to that > point. > > Andy > > > >
