> > I had not yet time to look at dbpedia 3.8. They might have changed > names of some dump files. Generally "instance_types" are very > important (this provides the information about the type of an Entity). > "person_data" includes additional information for persons, AFAIK those > information are not included in the default configuration of the > dbpedia indexing tool > > Not all language dumps have these files. Japanese, Italian also donot have these files. These files are listed in the readme file. Hence I was looking for these.
> > I get a java exception. > > The included exceptions look like the RDF file containing the Chinese > labels is not well formatted. The experience says that this is most > likely related to char encoding issues. This was also the case with > some dbpedia 3.7 files (see the special treatment of some files in the > shell script of the dbpedia). > > OK. I will try to debug this. > You will need to have a look at the line that caused the error > (labels_zh.nt.bz2; [line: 6972, col: 46] Broken token: > http://www.w3.org/2000/01/rdf-sche). If it is indeed a encoding > related issue there are some linux command line utilities to check and > correct those issues. If you are unsure feel free to post this line > within this thread. > > > Chinese labels for the English dbpedia > ("http://dboedua.org/resource/{name}") should work for that reason. > The Chinese version ("http://zh.dboedua.org/resource/{name}") would > just provide more Entities (not more information for entities included > in the English version. > > "dboedua"? I dont find > http://dboedua.org<http://dboedua.org/resource/%7Bname%7D>any server. Is it > some keyboard mistake? (yours being a different language keyboard). -harish
