Re: Olivier's presentation on Stanbol at ApacheCon

Alex Lopez Fri, 18 Nov 2011 08:14:30 -0800

I wanted to share my 2 cents about the classification using Stanbol as Ihad relatively good results applying Olivier's method (usingMoreLikeThis to compare the input text with wikipedia abstracts) withinmy Stanbol instance running a dbpedia index:

Using RemoteStreaming to classify remote plain text (in this examplesome RFC about mail) on a default Stanbol using full launcher:


http://stanbolserver/solr/default/dbpedia_43k/mlt?stream.url=http://www.rfc-editor.org/rfc/rfc6409.txt&mlt.fl=@en/rdfs:comment/&mlt.interestingTerms=list&mlt.mintf=0&fl=ref/rdf:type/+ref/dc:subject/

or if a better index has been loaded (dbpedia) with indexed abstracts:

http://stanbolserver/solr/default/dbpedia/mlt?stream.url=http://www.rfc-editor.org/rfc/rfc6409.txt&mlt.fl=@en/dbp-ont:abstract/&mlt.interestingTerms=list&mlt.mintf=0&fl=ref/rdf:type/+ref/dc:subject/

Then process results: infer common broader categories, etc.

Just to make some tests I extracted the most-repeated broader categoriesusing all dc:subject with the above text and yielded:


Internet
Email
Internet_protocols
World_Wide_Web
Application_layer_protocols

Another example using a Portuguese text (bible fragment):

http://stanbolserver/solr/default/dbpedia/mlt?stream.url=http://scrapmaker.com/data/wordlists/genesis/portuguese.txt&mlt.fl=@pt/dbp-ont:abstract/&mlt.interestingTerms=list&mlt.mintf=0&fl=ref/dc:subject/

Categories_named_after_religious_texts
Christian_liturgy,_rites,_and_worship_services
Christian_theology

It works for me :)

However in the on-line instances I tested, the SOLR server didn't seemto be exposed (as it is in last Stanbol revisions) so I can't give anyready-to-see working example.


Thanks Olivier for the great idea!

Em 18-11-2011 15:52, Olivier Grisel escreveu:

2011/11/18 Reto Bachmann-Gmür<[email protected]>:

On Tue, Nov 15, 2011 at 2:09 PM, Bertrand Delacretaz<[email protected]

wrote:

On Tue, Nov 15, 2011 at 12:45 PM, Stefane Fermigier<[email protected]>  wrote:

Is online here:

http://www.slideshare.net/nuxeo/apache-stanbol-and-the-web-of-data-apachecon-2011

I attended Olivier's presentation and was impressed by the results of
his Universal Topic Classification experiment (starting at slide 38).

The results look very impressive. Is there some documentation on how to set
up this effective topic classification?


Right now it's still a prototype using solr directly. I need to
refactor a bunch of stuff but that will likely be impacted by the new
RDF Path mapper / indexer we are gonna work on during the hackathon.

Re: Olivier's presentation on Stanbol at ApacheCon

Reply via email to