Re: [Dbpedia-discussion] Fwd: [extraction-framework] Update the CreateFreebaseLinks based on the new Freebase RDF dump format (#25)

2013-03-25 Thread Jona Christopher Sahnwaldt
Hi Andrea, Wikipedia page ids (URL parameter curid) are more stable than page titles, and according to Tom, Freebase uses them as the main links to Wikipedia, but DBpedia still uses the current page title as the canonical resource IRI, so the DBpedia-to-Freebase linkset has to use the page title.

Re: [Dbpedia-discussion] Fwd: [extraction-framework] Update the CreateFreebaseLinks based on the new Freebase RDF dump format (#25)

2013-03-25 Thread Jona Christopher Sahnwaldt
On 25 March 2013 15:00, Tom Morris wrote: > Another approach might be to use the recently introduced Topic Equivalent > Webpage property: > > ns:m.09q3rp ns:common.topic.topic_equivalent_webpage > . > ns:m.09q3rp ns:common.topic.topic_equivalent_webpage

Re: [Dbpedia-discussion] Fwd: [extraction-framework] Update the CreateFreebaseLinks based on the new Freebase RDF dump format (#25)

2013-03-25 Thread Jona Christopher Sahnwaldt
On 25 March 2013 14:18, Tom Morris wrote: > I wouldn't claim that Freebase is bug-free, but that's a quite old and > simple algorithm, so unless they're triples from very early in it's life > (say, 2007), I'd guess that bad input data from Wikipedia is more likely > than a problem with the transfo

Re: [Dbpedia-discussion] Fwd: [extraction-framework] Update the CreateFreebaseLinks based on the new Freebase RDF dump format (#25)

2013-03-25 Thread Andrea Di Menna
Sorry, wrong information. We should use Page Ids ( http://downloads.dbpedia.org/3.8/en/page_ids_en.nt.bz2) I am going to try something. Cheers Andrea 2013/3/25 Andrea Di Menna > Hi, > > we have article numeric ids in the quads file (as oldid parameter). > Jona, do you think this is worth givi

Re: [Dbpedia-discussion] Fwd: [extraction-framework] Update the CreateFreebaseLinks based on the new Freebase RDF dump format (#25)

2013-03-25 Thread Andrea Di Menna
Hi, we have article numeric ids in the quads file (as oldid parameter). Jona, do you think this is worth giving a try? Regards Andrea 2013/3/25 Tom Morris > Another approach might be to use the recently introduced Topic Equivalent > Webpage property: > > ns:m.09q3rp ns:common.topic.topic_e

Re: [Dbpedia-discussion] Fwd: [extraction-framework] Update the CreateFreebaseLinks based on the new Freebase RDF dump format (#25)

2013-03-25 Thread Tom Morris
Another approach might be to use the recently introduced Topic Equivalent Webpage property: ns:m.09q3rp ns:common.topic.topic_equivalent_webpage< http://pt.wikipedia.org/wiki/Marlín>. ns:m.09q3rp ns:common.topic.topic_equivalent_webpage< http://es.wikipedia.org/wiki/Marlín_

Re: [Dbpedia-discussion] Fwd: [extraction-framework] Update the CreateFreebaseLinks based on the new Freebase RDF dump format (#25)

2013-03-25 Thread Tom Morris
I wouldn't claim that Freebase is bug-free, but that's a quite old and simple algorithm, so unless they're triples from very early in it's life (say, 2007), I'd guess that bad input data from Wikipedia is more likely than a problem with the transformation. It might help to give a little background

Re: [Dbpedia-discussion] Fwd: [extraction-framework] Update the CreateFreebaseLinks based on the new Freebase RDF dump format (#25)

2013-03-25 Thread Andrea Di Menna
Hi all, it looks like there are actually some pages in Wikipedia which contain wrong data, which is where the pages originate from in Freebase, e.g. http://en.wikipedia.org/wiki/Marl%C3%83%C2%ADn,_%C3%83%C2%81vila This page has been deleted on Jan 21, and this actually lead to the Freebase key

Re: [Dbpedia-discussion] Fwd: [extraction-framework] Update the CreateFreebaseLinks based on the new Freebase RDF dump format (#25)

2013-03-25 Thread Andrea Di Menna
Hi, Maybe the only thing that can be done is to notify the freebase discussion list about this problem. Agree with Jona that the number of problematic references is not relevant. Cheers Andrea 2013/3/25 Jona Christopher Sahnwaldt > > On Mar 25, 2013 3:32 AM, "Tom Morris" wrote: > > > > Can so

Re: [Dbpedia-discussion] Fwd: [extraction-framework] Update the CreateFreebaseLinks based on the new Freebase RDF dump format (#25)

2013-03-25 Thread Jona Christopher Sahnwaldt
On Mar 25, 2013 3:32 AM, "Tom Morris" wrote: > > Can someone point to the part of the discussion which talks about what the problem is? This thread seems to start in mid-stream... That's right. Sorry. The start of the thread is in the middle of this page: https://github.com/dbpedia/extraction-f

Re: [Dbpedia-discussion] Fwd: [extraction-framework] Update the CreateFreebaseLinks based on the new Freebase RDF dump format (#25)

2013-03-24 Thread Tom Morris
Can someone point to the part of the discussion which talks about what the problem is? This thread seems to start in mid-stream... Freebase's MQL key encoding (http://wiki.freebase.com/wiki/MQL_key_escaping) is a completely private encoding which shouldn't have any effect on external URIs/IRIs/re

[Dbpedia-discussion] Fwd: [extraction-framework] Update the CreateFreebaseLinks based on the new Freebase RDF dump format (#25)

2013-03-24 Thread Jona Christopher Sahnwaldt
On 22 March 2013 23:21, Andrea Di Menna wrote: > > Hi Jona, > > thanks for merging the pull request! > > Anyway, couldn't we use percent encoding for Unicode code points which are > not allowed in N-Triples? (namely those outside the [#x20,#7E] range? > In this case we should get UTF-8 bytes and p

[Dbpedia-discussion] Fwd: [extraction-framework] Update the CreateFreebaseLinks based on the new Freebase RDF dump format (#25)

2013-03-24 Thread Jona Christopher Sahnwaldt
> On 22 March 2013 23:21, Andrea Di Menna wrote: >> >> Hi Jona, >> >> thanks for merging the pull request! >> >> Anyway, couldn't we use percent encoding for Unicode code points which are >> not allowed in N-Triples? (namely those outside the [#x20,#7E] range? >> In this case we should get UTF-8 b

[Dbpedia-discussion] Fwd: [extraction-framework] Update the CreateFreebaseLinks based on the new Freebase RDF dump format (#25)

2013-03-24 Thread Jona Christopher Sahnwaldt
On 22 March 2013 23:21, Andrea Di Menna wrote: > > Hi Jona, > > thanks for merging the pull request! > > Anyway, couldn't we use percent encoding for Unicode code points which are > not allowed in N-Triples? (namely those outside the [#x20,#7E] range? > In this case we should get UTF-8 bytes and p