Thanks Andy. Awesome!

So, I am downloading the latest dump of Freebase RDF ->
freebase-rdf-2012-12-23-00-00.gz

Let me check with that and use tdbloader to see if it has been corrected.

Also, when will JENA 2.10.0 with this correction, be released?

Thank you!

With Regards,
Abhishek S


On Fri, Dec 28, 2012 at 6:32 PM, Andy Seaborne <[email protected]> wrote:

> On 28/12/12 07:42, Abhishek Shivkumar wrote:
>
>> Hi Andy,
>>
>>    Here are the triples from the neighborhood of line 270608. i tried
>> finding the error but couldn't. Do you see any by chance?
>> I printed the line number too on the left just in case.  Ex: "line num
>> 270591-"
>>
>
> Not quite the right line but close ... this may be the problem:
>
> Line:
> -----------------
>
> ns:m.01gqn1 ns:base.braziliangovt.**brazilian_political_party.**number
>    13.
> -----------------
>
> and the problem is the   13.
>
> The WG spec in development has:
>
> [21]    DECIMAL         ::=     [+-]? [0-9]* '.' [0-9]+
>
> so a decimal must have a trailing digit, and "13." is integer 13 followed
> by a DOT (terminates the triples).
>
> But in the W3C submission has a know problem in this area:
>
> [18]    decimal         ::=     ('-' | '+')? ( [0-9]+ '.' [0-9]* | '.'
> ([0-9])+ | ([0-9])+ )
>
> and 13. is ambiguous.  Is it 13 and a DOT or a decimal with lexical form
> "13."  The normal way to tokenize is to choose the longest match (so ":abc"
> isn't ":a" then "bc") and that means you need a space to the tokens '13'
> and DOT
>
> Jena 2.7.4 follows the submission and "13." is a decimal and the needs a
> trailing DOT.
>
> In fact, using space-DOT everywhere would be very sensible.  Trailing dots
> on prefix names may confuse some older parsers.
>
> Jena development (2.10.0) follows the W3C WG spec and it's 13 integer and
> a trailing DOT and parses.
>
> Do you have a corrected version of freebase-rdf-2012-12-09-00-00?  I
> downloaded it but there are other things to fix up before it gets to that
> point.
>
>         Andy
>
>
>
>

Reply via email to