Hi,
Olivier Grisel schrieb:
2011/9/14 Stefane Fermigier <[email protected]>:
Anyway, yes he is trashing Stanbol (at least, not saying that the Stanbol
version is using is still an early prototype), but he is fair in his
conclusions.
And I think that recall and precision ~= 50% for a project where entity
extraction is just a side project is already a promising result !
No it's not, its completely useless in this current state. But there
are easy ways to greatly improve the current state:
- make the NamedEntityTaggingEngine have an option to ignore potential
matches that are not and exact name match (that should improve the
precision dramatically)
+1
- build and distribute more complete indexes and document on the
homepage of the project how to download and deploy them (that should
improve the recall): this is improving but still not easy to do for
the end users => nobody does it and instead uses the small index that
comes by default
FYI: I am working on a howto [1] for creating and using indexes (still
staging draft), where I could also link to pre-generated indexes served
by [2].
Andreas
[1]
http://stanbol.staging.apache.org/stanbol/docs/trunk/customvocabulary.html
[2] http://dev.iks-project.eu/downloads/stanbol-indices/