Re: [Suggestion] Enhancement confidence range [0..1] and addition of confidence-levels

Rupert Westenthaler Thu, 24 May 2012 02:19:27 -0700

Hi,

this may happen in rare cases if the copying of the dbpedia default index data 
take to much time. His is because of an design issue of the Entityhub that does 
not allow to activate a ReferencedSite only after the SolrIndex with the 
required data are available. While this may causes for the integration-tests is 
it usually no problem during normal operations.


I think there is also a Jira issue about that, but the Apache Jira is currently 
not accessible for me.

Best Rupert

-- 
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Am 24.05.2012 um 11:03 schrieb Alessandra Donnini <[email protected]>:

> I tried to compile from trunk version, but I receive this error, any 
> suggestion? I attach also 
> org.apache.stanbol.commons.httpqueryheaders.it.HttpQueryHeaderGetTest.txt 
> that contains (I suppose) the specific error log.
> regards
> Alessandra Donnini
> --------------------------------------------
> <TEST-org.apache.stanbol.commons.httpqueryheaders.it.HttpQueryHeaderGetTest.xml>
> <org.apache.stanbol.commons.httpqueryheaders.it.HttpQueryHeaderGetTest.txt>
> 
> Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.11:test (default-test) on 
> project org.apache.stanbol.integration-tests: There are test failures.
> [ERROR] 
> [ERROR] Please refer to 
> /Users/ale/Documents/textmining/stanbol/stanbol0.9releaseCandidate/stanbol-last/stanbol/integration-tests/target/surefire-reports
>  for the individual test results.
> [ERROR] -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.apache.maven.plugins:maven-surefire-plugin:2.11:test (default-test) 
> on project org.apache.stanbol.integration-tests: There are test failures.
> 
> Please refer to 
> /Users/ale/Documents/textmining/stanbol/stanbol0.9releaseCandidate/stanbol-last/stanbol/integration-tests/target/surefire-reports
>  for the individual test results.
>    at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:213)
>    at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
>    at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
>    at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84)
>    at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59)
>    at 
> org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183)
>    at 
> org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161)
>    at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:319)
>    at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156)
>    at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537)
>    at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)
>    at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>    at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at 
> org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290)
>    at 
> org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230)
>    at 
> org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409)
>    at 
> org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352)
> Caused by: org.apache.maven.plugin.MojoFailureException: There are test 
> failures.
> 
> Please refer to 
> /Users/ale/Documents/textmining/stanbol/stanbol0.9releaseCandidate/stanbol-last/stanbol/integration-tests/target/surefire-reports
>  for the individual test results.
>    at 
> org.apache.maven.plugin.surefire.SurefireHelper.reportExecution(SurefireHelper.java:87)
>    at 
> org.apache.maven.plugin.surefire.SurefirePlugin.writeSummary(SurefirePlugin.java:651)
>    at 
> org.apache.maven.plugin.surefire.SurefirePlugin.handleSummary(SurefirePlugin.java:625)
>    at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:136)
>    at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:97)
>    at 
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101)
>    at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209)
>    ... 19 more
> [ERROR] 
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR] 
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn <goals> -rf :org.apache.stanbol.integration-tests
> Alessandra Donnini
> Etcware s.r.l. via Etna 13 - 00141 Roma
> [email protected]
> mobile +39 333 8914865
> tel/fax 06 64495131
> 
> 
> 
> 
> 
> Il giorno 24/mag/2012, alle ore 09.04, Fabian Christ ha scritto:
> 
>> Hi Alessandra,
>> 
>> the CELI engines were not part of the 0.9.0-incubating release. To get
>> these new engines, you have to checkout the latest sources (trunk) of
>> Apache Stanbol and compile it yourself.
>> 
>> http://incubator.apache.org/stanbol/docs/trunk/tutorial.html
>> 
>> Best,
>> - Fabian
>> 
>> 2012/5/24 Alessandra Donnini <[email protected]>:
>>> Are the new CELI enhancement engines available in the last release 
>>> apache-stanbol-0.9.0-incubating (2012/05/08)  available in 
>>> http://incubator.apache.org/stanbol/downloads/releases.html?
>>> Do I need to download files from 
>>> https://issues.apache.org/jira/browse/STANBOL-583 and install them? If so 
>>> how should I do?
>>> thanks
>>> Alessandra Donnini
>>> 
>>> 
>>> 
>>> 
>>> Il giorno 24/mag/2012, alle ore 08.18, Rupert Westenthaler ha scritto:
>>> 
>>>> Hi all,
>>>> 
>>>> In the last two weeks I considerable improved the validation of the
>>>> Enhancements created by the different Stanbol Enhancement Engines.
>>>> Here is the list of related issues:
>>>> 
>>>> * STANBOL-613: Define how to retrieve the language of the parsed content
>>>> * STANBOL-617: Define how to encode fise:TopicEnhancements
>>>> * STANBOL-625: Add link to the entityhub:site if suggested Entity is
>>>> available via the Entityhub
>>>> 
>>>> Note also STANBOL-612 - providing a utility class that easily allows
>>>> to validate created enhancements in unit tests of EnhancementEngines.
>>>> All existing engines do now use this utility to validate Enhancements.
>>>> This is also true for the contributed CELI engine (STANBOL-583)
>>>> already confirm to those tests.
>>>> 
>>>> The next think I would like to make more clear (and easier to
>>>> use/understand) is how confidence is represented for Stanbol
>>>> Enhancements. Related to this I would like to discuss the following
>>>> two suggestions:
>>>> 
>>>> ### Suggestion 1: Require confidence values to be in the range [0..1]
>>>> 
>>>> This is an long going discussion, but I would really like to add a
>>>> check that enforces confidence values to be in the range between
>>>> [0..1].
>>>> 
>>>> I think this change is necessary, because it moves the responsibility
>>>> for interpreting confidence values from the Stanbol users to the
>>>> implementors of the Engines. I know that providing confidence values
>>>> is a hard thing to do, but while it may be hard for Engine developers
>>>> it is near to impossible to Stanbol users to do so.
>>>> 
>>>> Note that EnhancementEngine would still be free to create Enhancements
>>>> with no "fise:confidence" value.
>>>> 
>>>> Surprisingly a lot of the existing Engines do already confirm to this
>>>> rule. The most prominent exception is the Named Entity Tagging Engine
>>>> (o.a.s.enhancer.engine.entitytagging). Because of this I implemented
>>>> already an algorithm that normalizes confidence values by a
>>>> combination of the levenshtein distance (selected-text <-> entity
>>>> label) and the Solr result score for the Entity (see STANBOL-624 for
>>>> details).
>>>> 
>>>> If we could agree to this rule I would use a similar approach also for
>>>> other Engines that do not yet normalize confidence values between
>>>> [0..1]
>>>> 
>>>> ### Suggestion 2: Add fise:confidence-level property
>>>> 
>>>> The "confidence-level" is intended to make it easier for clients to
>>>> decide how to process Enhancements. It would not use a numerical range
>>>> but four distinct values:
>>>> 
>>>> * confident: Meaning that a match is very likely - indicating that
>>>> those annotations typically can be accepted automatically (e.g. If the
>>>> EntityLinking engine finds a single Entity that exactly matches the
>>>> text selected by an text annotation)
>>>> * ambiguous: Meaning that there are several possibilities but is is
>>>> still likely that one of them match (e.g. Paris, Paris (Texas))
>>>> * suggestion: Meaning that the match is not completely certain, but
>>>> there are not several options (e.g. Germans -> Germany)
>>>> * uncertain: Meaning that Entities do match, but the probability of a
>>>> match is rather speculative (e.g. John -> Elton John)
>>>> 
>>>> IMHO using this classification would fit a lot of engines much better
>>>> as the numeric "fise:confidence" property as it does not rise the
>>>> expectation in users that confidence values are on a rational scale
>>>> (e.g. a Enhancement with a confidence of "0.8" is not two times as
>>>> likely as one with "0.4").
>>>> 
>>>> Engines would have the possibility to manually add those information
>>>> to enhancements. For enhancements that do not define those we could
>>>> implement an post-processing engine that adds those based on generic
>>>> rules.
>>>> 
>>>> e.g.
>>>> 
>>>> * ignore Enhancements with an existing "confidence-level" assignment
>>>> * TextAnnotations with a confidence value > 0.8 => confident
>>>> * TextAnnotations with a confidence value < 0.8 > 0.5 => suggestion
>>>> * TextAnnotations with a confidence value < 0.5 => uncertain
>>>> * TextAnnotations with a single linked EntityAnnotation with a
>>>> confidence > 0.8 => confident
>>>> * TextAnnotations with a several linked EntityAnnotation with a
>>>> confidence > 0.8 => ambiguous *)
>>>> * TextAnnotations with several linked EntityAnnotations with a
>>>> confidence > 0.5 but no one > 0.8 => ambiguous *)
>>>> * TextAnnotations with a single linked EntityAnnotation with a
>>>> confidence < 0.8 > 0.5 => suggestion
>>>> * TextAnnotations with EntityAnnotations with confidence values < 0.5
>>>> => uncertain
>>>> * TopicAnnotation with a confidence value > 0.8 => confident
>>>> * TopicAnnotation with a confidence value < 0.8 > 0.5 => suggestion
>>>> * TopicAnnotation with a confidence value < 0.5 => uncertain
>>>> 
>>>> *) NOTE that in those cases only EntityAnnotations with a confidence
>>>> value > 0.5 would be marked as "ambiguous". Additional
>>>> EntityAnnotations with confidence values < 0.5 would be marked as
>>>> "uncertain"
>>>> 
>>>> The values '0.8' and '0.5' should be configurable.
>>>> 
>>>> Note that "fise:confidence-level" could be also used by Engines that
>>>> can not provide fise:confidence values (E.g. the langid engine could
>>>> mark detected languages as "uncertain" if the parsed text was to
>>>> short).
>>>> 
>>>> WDYT
>>>> Rupert
>>>> 
>>>> 
>>>> --
>>>> | Rupert Westenthaler             [email protected]
>>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>>> | A-5500 Bischofshofen
>>> 
>> 
>> 
>> 
>> -- 
>> Fabian
>> http://twitter.com/fctwitt
>

Re: [Suggestion] Enhancement confidence range [0..1] and addition of confidence-levels

Reply via email to