On 08/01/13 11:00, Abhishek Shivkumar wrote:
Hi Andy,

    I am using the script to correct the errors. When I run the script dwim
on all the part files, it shows error messages, and continues processing.
Are these errors that are corrected, or still existing that need attention?
Sample error message is:

ERROR [line:25335, col:25] Unknown char: \(92)

What's on the lines around there?
And if you've split the dump, which file?

That needs correcting in the source. I can pare the first 30k lines of the file with Jena with no fixups.

Maybe you don't have exactly the version of Freebase that I did freebase-rdf-2012-12-23-00-00.gz. There is no suspect forms around line 25K of my copy.

ns:award.award_winner   ns:type.type.instance   ns:m.03cpgmq.
ns:award.award_winner   ns:type.type.instance   ns:m.05x3tbk. <---25335
ns:award.award_winner   ns:type.type.instance   ns:m.05q_rp.

You also need the latest version of Jena (recent 2.10.0 SNAPSHOT).


Just wanted to know if we can ignore these messages while running the dwim
script.

You can ignore WARN. ERRORs usually stop the parser as they indicate structural problems.

        Andy


Thank you!

With Regards,
Abhishek S


On Sat, Dec 29, 2012 at 1:58 PM, Andy Seaborne <[email protected]> wrote:

If you want to parse the Freebase dump, try this:

http://people.apache.org/~**andy/Freebase20121223/Notes.**txt<http://people.apache.org/%7Eandy/Freebase20121223/Notes.txt>

It takes about 90 minutes on my home desktop machine to fix and parse the
data.

To load it, get a very large machine - it has been reported [1] that a
previous dump has been loaded into TDB.

         Andy

[1] http://lists.freebase.com/**pipermail/freebase-discuss/**
2012-December/010169.html<http://lists.freebase.com/pipermail/freebase-discuss/2012-December/010169.html>



Reply via email to