Hi,

Olivier Grisel schrieb:
2011/9/14 Stefane Fermigier <[email protected]>:
Anyway, yes he is trashing Stanbol (at least, not saying that the Stanbol 
version is using is still an early prototype), but he is fair in his 
conclusions.

And I think that recall and precision ~= 50% for a project where entity 
extraction is just a side project is already a promising result !

No it's not, its completely useless in this current state. But there
are easy ways to greatly improve the current state:

- make the NamedEntityTaggingEngine have an option to ignore potential
matches that are not and exact name match (that should improve the
precision dramatically)

+1


- build and distribute more complete indexes and document on the
homepage of the project how to download and deploy them (that should
improve the recall): this is improving but still not easy to do for
the end users => nobody does it and instead uses the small index that
comes by default

FYI: I am working on a howto [1] for creating and using indexes (still staging draft), where I could also link to pre-generated indexes served by [2].

Andreas

[1] http://stanbol.staging.apache.org/stanbol/docs/trunk/customvocabulary.html
[2] http://dev.iks-project.eu/downloads/stanbol-indices/

Reply via email to