1. I am using the correct version of rdf file that you have. 2. This error of unknown char (\92) is appearing in all the files at different line numbers. I am not sure what this unknown char \(92) is. Tried to look in the surrounding of the line number in the file contents but can't find it :( 3. I can only find version 2.7.4 at http://www.apache.org/dist/jena/binaries/. May be THIS is the reason. Do you know where I can download the 2.10.0 version?
Thanks much! Thank you! With Regards, Abhishek S On Tue, Jan 8, 2013 at 5:26 AM, Andy Seaborne <[email protected]> wrote: > On 08/01/13 11:00, Abhishek Shivkumar wrote: > >> Hi Andy, >> >> I am using the script to correct the errors. When I run the script >> dwim >> on all the part files, it shows error messages, and continues processing. >> Are these errors that are corrected, or still existing that need >> attention? >> Sample error message is: >> >> ERROR [line:25335, col:25] Unknown char: \(92) >> > > What's on the lines around there? > And if you've split the dump, which file? > > That needs correcting in the source. I can pare the first 30k lines of > the file with Jena with no fixups. > > Maybe you don't have exactly the version of Freebase that I did > freebase-rdf-2012-12-23-00-00.**gz. There is no suspect forms around > line 25K of my copy. > > ns:award.award_winner ns:type.type.instance ns:m.03cpgmq. > ns:award.award_winner ns:type.type.instance ns:m.05x3tbk. <---25335 > ns:award.award_winner ns:type.type.instance ns:m.05q_rp. > > You also need the latest version of Jena (recent 2.10.0 SNAPSHOT). > > > >> Just wanted to know if we can ignore these messages while running the dwim >> script. >> > > You can ignore WARN. ERRORs usually stop the parser as they indicate > structural problems. > > Andy > > >> Thank you! >> >> With Regards, >> Abhishek S >> >> >> On Sat, Dec 29, 2012 at 1:58 PM, Andy Seaborne <[email protected]> wrote: >> >> If you want to parse the Freebase dump, try this: >>> >>> http://people.apache.org/~****andy/Freebase20121223/Notes.****txt<http://people.apache.org/%7E**andy/Freebase20121223/Notes.**txt> >>> <http://people.apache.org/%**7Eandy/Freebase20121223/Notes.**txt<http://people.apache.org/%7Eandy/Freebase20121223/Notes.txt> >>> > >>> >>> >>> It takes about 90 minutes on my home desktop machine to fix and parse the >>> data. >>> >>> To load it, get a very large machine - it has been reported [1] that a >>> previous dump has been loaded into TDB. >>> >>> Andy >>> >>> [1] >>> http://lists.freebase.com/****pipermail/freebase-discuss/**<http://lists.freebase.com/**pipermail/freebase-discuss/**> >>> 2012-December/010169.html<http**://lists.freebase.com/** >>> pipermail/freebase-discuss/**2012-December/010169.html<http://lists.freebase.com/pipermail/freebase-discuss/2012-December/010169.html> >>> > >>> >>> >> >
