Re: [Dbpedia-discussion] Duplicates in skos_categories_en.nt

2013-03-07 Thread Dimitris Kontokostas
Hi Neil, On Thu, Mar 7, 2013 at 3:25 PM, Neil Ireson wrote: > Thanks for the replies, > > Unfortunately it's not the subject==object bug. I tried to have a look at > the extraction code to see if I could find a fix but unfortunately it's > written in a language I don't read. I have to say I thi

Re: [Dbpedia-discussion] page article has last modified timestamp

2013-03-07 Thread Jona Christopher Sahnwaldt
Hi Gaurav, I'm sorry - I don't have time to tell you exactly which parts of the code have to be changed. For someone familiar with Scala it should be fairly simple to figure it out. Maybe some other developers can help you, so I'm forwarding your mail to the list again. Regards, JC On Thu, Mar 7

Re: [Dbpedia-discussion] Cleaner abstract extraction

2013-03-07 Thread Jona Christopher Sahnwaldt
On Thu, Mar 7, 2013 at 2:22 PM, Dimitris Kontokostas wrote: > Hi JC, > > Ok about the import script but, you mean that we don't need the modified > mediawiki either? I don't think we need the whole thing, just the three files that are added/modified. In other words, we need https://github.com/dbp

Re: [Dbpedia-discussion] Duplicates in skos_categories_en.nt

2013-03-07 Thread Neil Ireson
Thanks for the replies, Unfortunately it's not the subject==object bug. I tried to have a look at the extraction code to see if I could find a fix but unfortunately it's written in a language I don't read. I have to say I think it's a shame that you chose Scala over Java code as there must be

Re: [Dbpedia-discussion] Cleaner abstract extraction

2013-03-07 Thread Dimitris Kontokostas
Hi JC, Ok about the import script but, you mean that we don't need the modified mediawiki either? About the documentation, I think that we should move all development-related documentation to github I already ported a few stuff there https://github.com/dbpedia/extraction-framework/wiki/Extraction

Re: [Dbpedia-discussion] Cleaner abstract extraction

2013-03-07 Thread Jona Christopher Sahnwaldt
Hi Gaurav, the code in dbpedia/dbpedia/abstractExtraction is no longer used and maintained. I just added a wiki page with some instructions for the new abstract extraction: http://wiki.dbpedia.org/AbstractExtraction It's very basic, but I hope it helps you to get started. @developers: I'm not

Re: [Dbpedia-discussion] page article has last modified timestamp

2013-03-07 Thread Jona Christopher Sahnwaldt
Hi Gaurav, the simplest way to filter out unmodified pages is probably to add a filter in ExtractionJob.scala [1]. We don't yet have configurable filters, so you will have to modify the source code. You basically have to change this line: if (namespaces.contains(page.title.namespace)) { to somet

Re: [Dbpedia-discussion] Duplicates in skos_categories_en.nt

2013-03-07 Thread Dimitris Kontokostas
Yup, I got it wrong :) the titles are so much alike that I got confused to think that it was a same subject with object bug sorry! Dimitris On Thu, Mar 7, 2013 at 1:10 PM, Jona Christopher Sahnwaldt wrote: > Hi Neil, Dimitris, > > if I understand Neil correctly, he means that some triples are >

Re: [Dbpedia-discussion] Duplicates in skos_categories_en.nt

2013-03-07 Thread Jona Christopher Sahnwaldt
Hi Neil, Dimitris, if I understand Neil correctly, he means that some triples are duplicated. For example, the triple . appear

Re: [Dbpedia-discussion] Abstract extraction problem

2013-03-07 Thread Jona Christopher Sahnwaldt
Hi Riko, > - java.net.UnknownHostException: www.w3.org This is weird. Could you please send us the whole stack trace? I don't think the extraction framework should try to access anything but localhost. Could be some kind of XML schema thing. If it is, we should probably turn it off. I still don't

[Dbpedia-discussion] Bls: Abstract extraction problem

2013-03-07 Thread Riko Adi Prasetya
Hi Dimitris, I use my campus' internet connection that must use proxy. So, i must configure it in extraction-framework/dump/pom.xml.  I configure it like this,                                                                             extraction                             org.dbpedia.extracti

Re: [Dbpedia-discussion] page article has last modified timestamp

2013-03-07 Thread Amit Kumar
Hi Gaurav, Where did you read that the bulk download is only supported for English and German ? I tried the italian API endpoint and it works okie. http://it.wikipedia.org/w/api.php?action=query&export&exportnowrap&prop=revisions&rvprop=timestamp|content&titles=India

Re: [Dbpedia-discussion] page article has last modified timestamp

2013-03-07 Thread Dimitris Kontokostas
You could use Amit suggestion for getting the modified articles and then this methods: https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/sources/WikiSource.scala to get a copy of them from Wikipedia Otherwise you can download the monthly dumps an

Re: [Dbpedia-discussion] Duplicates in skos_categories_en.nt

2013-03-07 Thread Dimitris Kontokostas
Hi Neil, Thanks for the bug report, we already fixed that [1] but effects will be seen in the next release Best, Dimitris [1] https://github.com/dbpedia/extraction-framework/commit/2cb7d621b45cf07c1c59638e0c2cc3fc71fa0cbb On Wed, Mar 6, 2013 at 11:30 PM, Neil Ireson wrote: > Hi all, > > I'm

Re: [Dbpedia-discussion] page article has last modified timestamp

2013-03-07 Thread gaurav pant
Hi Dimitris, Actually In my previous mail I want to know if there is any such API using which I can get dump of updated page-article pages so that out of them I can generate live dump file(nt,ttl) for other language than english? On Thu, Mar 7, 2013 at 1:57 PM, Dimitris Kontokostas wrote: > Hi a

Re: [Dbpedia-discussion] page article has last modified timestamp

2013-03-07 Thread Dimitris Kontokostas
Hi all, This is exactly what DBpedia Live is doing. We have "Feeders" that place articles in a process queue, extract triples out of them and then update a triple store. For now we mainly use OAI-PMH for our feeds but, we could easily add a new IRC-Feeder for Amit's needs. It depends on what you