Author: rwesten Date: Tue Feb 22 14:16:41 2011 New Revision: 1073335 URL: http://svn.apache.org/viewvc?rev=1073335&view=rev Log: resolves STANBOL-37: The LocationEnhancement Engine can now use together with geonames.org user accounts
Added a readme for this engine (based on the documentation found on the IKS wiki) Added a description on how to setup free user accounts Corrected several bugs related to cases where the default values where used even if others where parsed by the configuration. The unit tests still use the ws.geonames.org (that does not require user authentication) because sharing a user account for testing reasons seamed not the right thing to do. Also kept the default for the geonames server (http://ws.geonames.org) because changing it to http://api.geonames.org would require to create an user account. updated the metatype properties to point users to the possibility to use http://api.geonames.org Added: incubator/stanbol/trunk/enhancer/engines/geonames/README.txt (with props) Modified: incubator/stanbol/trunk/enhancer/engines/geonames/src/main/java/org/apache/stanbol/enhancer/engines/geonames/impl/GeonamesAPIWrapper.java incubator/stanbol/trunk/enhancer/engines/geonames/src/main/java/org/apache/stanbol/enhancer/engines/geonames/impl/LocationEnhancementEngine.java incubator/stanbol/trunk/enhancer/engines/geonames/src/main/resources/OSGI-INF/metatype/metatype.properties Added: incubator/stanbol/trunk/enhancer/engines/geonames/README.txt URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/geonames/README.txt?rev=1073335&view=auto ============================================================================== --- incubator/stanbol/trunk/enhancer/engines/geonames/README.txt (added) +++ incubator/stanbol/trunk/enhancer/engines/geonames/README.txt Tue Feb 22 14:16:41 2011 @@ -0,0 +1,263 @@ +Geoname.org based Location Enhancement Engine for Apache Stanbol Enhancer + +This engine creates fise:EntityAnnotations based on the http://geonames.org +dataset. It does not directly work on the parsed content, but processes named +entities extracted by some NLP (natural language processing) engine. This engine +creates EnityAnnotations for Features found for named entities in the +geonames.org data set. In addition it adds EntityAnnotations for the continent, +country and administrative regions for entities with an high confidence level. + +Processed Annotations (Input) + +This engine consumes fise:TextAnnotations of type dbpedia:Place. More concrete +it filters for enhancements that confirm to the following two requirements and +consumes the text selected by the TextAnnotations: + + ?textAnnotation rdf:type fise:TextAnnotation . + ?textAnnotation dc:type dbpedia:Place + ?textAnnotation fise:selected-text ?text + +Here an example for such an TextAnnotations selecting the text "Vienna" form the +content "The community Workshop will take place in Vienna". + + urn:enhancement:text-enhancement:id1 + a fise:TextAnnotation , fise:Enhancement ; + dc:type + dbpedia:Place ; + fise:selected-text + "Vienna"^^xsd:string ; + fise:selection-context + "The community Workshop will take place in Vienna"^^xsd:string ; + fise:start + "46"^^xsd:int ; + fise:end + "52"^^xsd:int ; + fise:confidence + "0.9773640902587215"^^xsd:double ; + fise:extracted-from + urn:content-item:id1 . + +Typically such enhancements are created by engines that provide named entity +extraction based on some natural language processing framework. + + +Created Enhancements (Output) + +The LocationEnhancementEngine creates two types of EntityAnnotations. First it +suggests Entities for processed TextAnnotations and second it creates +EntityAnnotations for the hierarchy of regions the suggested Entities are +located in. Suggested Entities are connected with the "dc:relation" attribute +to the TextAnnotation they enhance. EntityAnnotations representing the hierarchy +define a dc:requires attribute to the EntityAnnotation. + + +Entity Suggestions + +Entity suggestions are EntityEnhancements that suggest Features of the +geonames.org dataset for an processed TextAnnotation. This suggestions are +currently only calculated based on the fise:selected-text of the TextAnnotation. +The following example shows three EntityAnnotations for the TextAnnotation used +in the above example. See the fise:relation statements at the end of each of the +two EntityAnnotations. + +The first Entity found in the geonames.orf dataset is the capital city in +Austria with an confidence level of 1.0: + + urn:enhancement:entity-enhancement:id1 + a fise:EntityAnnotation , fise:Enhancement ; + fise:confidence + "1.0"^^xsd:double ; + fise:entity-label + "Vienna"^^xsd:string ; + fise:entity-reference + http://sws.geonames.org/2761369/ ; + fise:entity-type + geonames:Feature , dbpedia:Place , dbpedia:Settlement , dbpedia:PopulatedPlace , geonames:P.PPLC ; + fise:extracted-from + urn:content-item:id1 ; + dc:relation + urn:enhancement:text-enhancement:id1 . + +With lower confidence levels there are a lot of other populated places with the +name "Vienna" found in the geonames.org dataset. + + urn:enhancement:entity-enhancement:id2 + a fise:EntityAnnotation , fise:Enhancement ; + fise:confidence + "0.42163702845573425"^^xsd:double ; + fise:entity-label + "Vienna"^^xsd:string ; + fise:entity-reference + http://sws.geonames.org/4496671/ ; + fise:entity-type + geonames:Feature , dbpedia:Place , dbpedia:Settlement , dbpedia:PopulatedPlace , geonames:P.PPL ; + fise:extracted-from + urn:content-item:id1 ; + dc:relation + urn:enhancement:text-enhancement:id1 . + + urn:enhancement:entity-enhancement:id3 + a fise:EntityAnnotation , fise:Enhancement ; + fise:confidence + "0.42163702845573425"^^xsd:double ; + fise:entity-label + "Vienna"^^xsd:string ; + fise:entity-reference + http://sws.geonames.org/4825976/ ; + fise:entity-type + geonames:Feature , dbpedia:Place , dbpedia:Settlement , dbpedia:PopulatedPlace , geonames:P.PPL ; + fise:extracted-from + urn:content-item:id1 ; + fdc:relation + urn:enhancement:text-enhancement:id1 . + + +Entity Hierarchy Enhancements + +Entity Hierarchy Enhancements describe the regions that contain suggested +Features based on the geonames.org dataset. Enhancements describing this +hierarchy are added for all suggested entities with a confidence level above +the value of "eu.iksproject.fise.engines.geonames.locationEnhancementEngine.min-hierarchy-score". +The default value for this property is 0.7. The hierarchy web service provided +by geonames.org is used to calculate the regions: +The following example shows the entity hierarchy enhancements for the suggested +entity for Vienna (Autria). Please note the dc:requires relation to this +EntityAnnotation at the end of each of the following enhancement. +First the enhancement for the continent Europe: + + urn:enhancement:entity-hierarchy-enhancement:id1 + a fise:EntityAnnotation , fise:Enhancement ; + fise:confidence + "0.42163702845573425"^^xsd:double ; + fise:entity-label + "Europe"^^xsd:string ; + fise:entity-reference + http://sws.geonames.org/6255148/ ; + fise:entity-type + geonames:Feature , dbpedia:Place, geonames:L.CONT ; + fise:extracted-from + urn:content-item:id1 ; + dc:requires + urn:enhancement:entity-enhancement:id1 . + +Next the enhancement for the country "Austria", classified as an independent +political entry within geonames.org + + urn:enhancement:entity-hierarchy-enhancement:id2 + a fise:EntityAnnotation , fise:Enhancement ; + fise:confidence + "0.42163702845573425"^^xsd:double ; + fise:entity-label + "Austria"^^xsd:string ; + fise:entity-reference + http://sws.geonames.org/2782113/ ; + fise:entity-type + geonames:Feature , dbpedia:Place, dbpedia: AdministrativeRegion, geonames:A.PCLI ; + fise:extracted-from + urn:content-item:id1 ; + dc:requires + urn:enhancement:entity-enhancement:id1 . + +Now three enhancement describing the different hierarchies of administrative +regions within Austria. First the "Bundesland", next the "Stadtteil" and last +the "Gemeindebezirk". + + urn:enhancement:entity-hierarchy-enhancement:id3 + a fise:EntityAnnotation , fise:Enhancement ; + fise:confidence + "0.42163702845573425"^^xsd:double ; + fise:entity-label + "Vienna"^^xsd:string ; + fise:entity-reference + http://sws.geonames.org/2761367/ ; + fise:entity-type + geonames:Feature , dbpedia:Place, dbpedia: AdministrativeRegion, geonames:A.ADM1 ; + fise:extracted-from + urn:content-item:id1 ; + dc:requires + urn:enhancement:entity-enhancement:id1 . + urn:enhancement:entity-hierarchy-enhancement:id4 + a fise:EntityAnnotation , fise:Enhancement ; + fise:confidence + "0.42163702845573425"^^xsd:double ; + fise:entity-label + "Politischer Bezirk Wien (Stadt)"^^xsd:string ; + fise:entity-reference + http://sws.geonames.org/2761333/ ; + fise:entity-type + geonames:Feature , dbpedia:Place, dbpedia: AdministrativeRegion, geonames:A.ADM2 ; + fise:extracted-from + urn:content-item:id1 ; + dc:requires + urn:enhancement:entity-enhancement:id1 . + urn:enhancement:entity-hierarchy-enhancement:id5 + a fise:EntityAnnotation , fise:Enhancement ; + fise:confidence + "0.42163702845573425"^^xsd:double ; + fise:entity-label + "Gemeindebezirk Innere Stadt"^^xsd:string ; + fise:entity-reference + http://sws.geonames.org/2775259/ ; + fise:entity-type + geonames:Feature , dbpedia:Place, dbpedia: AdministrativeRegion, geonames:A.ADM3 ; + fise:extracted-from + urn:content-item:id1 ; + dc:requires + urn:enhancement:entity-enhancement:id1 . + +The last two hierarchy levels are no longer valid for the meaning of "Vienna" as +selected by the TextAnnotation, but added, because the geonames.org dataset +locations the Feature of cities exactly in the center. However if the +TextAnnotation would describe a precise address such hierarchy levels would +completely make sense. +Configuration + +The LocationEnhancementEngine provides currently six configurations + +The first three can be used to optimise the behaviour of the Engine + - Minimum score (default = 0.33): The minimum score (confidence) that is required + for entity suggestions + - Maximum Locations (default = 3): The maximum numbers of entity + suggestions added (regardless if there would be more results with a + score > min-score. + - Maximum Locations (default = 0.7): The minimum score (confidence) that is + required that hierarchy enhancements are added for an suggested entity. + To add hierarchy enhancements for all suggested entities + min-hierarchy-score needs to be set to a value smaller equals + than min-score. + +The other three are used to configure the configured geonames.org server + - geonames.org Server: The URL of the geonames.org service. The default is the + free geonames.org webserver that works without user authentication. There + is a second free server at http://api.geonames.org/ that requires to setup + a free user account. Users with a premium account will require to add here + there own URL + - User Name: Thats the name of the account (can be empty if the configured + server does not require user authentication + - Token: The token is usually the password of the user account. + + + HOWTO setup a free user account: + + Such an account is required to be able to use the http://api.geonames.org/ server + that should support better performance and higher uptime than the default + free server available at http://ws.geonames.org/. + +To setup the free account: +(1) go to www.geonames.org. In the right top corner you will find a "login" link + that is also used to create new accounts +(2) choose a username and pwd. You will get an confirmation mail at the provided + email address. When choosing the password consider, that it will be sent + unencrypted (as token) with every webservice Request. Therefore it is + strongly suggested to do not use an password that is used for any other + account! +(3) confirm the account +(4) IMPORTANT: You need to activate the free web service for the account via + http://www.geonames.org/manageaccount. Log in first, go back to this site. + At the botton you should find the text "the account is not yet enabled to + use the free web services. Click here to enable" + +If you do not complete step (4) requests with your account will result in +IOExceptions with the message + "user account not enabled to use the free webservice. Please enable it on your account page: http://www.geonames.org/manageaccount" + \ No newline at end of file Propchange: incubator/stanbol/trunk/enhancer/engines/geonames/README.txt ------------------------------------------------------------------------------ svn:mime-type = text/plain Modified: incubator/stanbol/trunk/enhancer/engines/geonames/src/main/java/org/apache/stanbol/enhancer/engines/geonames/impl/GeonamesAPIWrapper.java URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/geonames/src/main/java/org/apache/stanbol/enhancer/engines/geonames/impl/GeonamesAPIWrapper.java?rev=1073335&r1=1073334&r2=1073335&view=diff ============================================================================== --- incubator/stanbol/trunk/enhancer/engines/geonames/src/main/java/org/apache/stanbol/enhancer/engines/geonames/impl/GeonamesAPIWrapper.java (original) +++ incubator/stanbol/trunk/enhancer/engines/geonames/src/main/java/org/apache/stanbol/enhancer/engines/geonames/impl/GeonamesAPIWrapper.java Tue Feb 22 14:16:41 2011 @@ -317,8 +317,8 @@ public class GeonamesAPIWrapper { * If no valid user name is parsed the token will be ignored. */ public GeonamesAPIWrapper(String searchService,String hierarchyService,String userName, String token){ - this.searchServiceUrl = GEONAMES_ORG_WEBSERVICE_URL+SEARCH_SERVICE_PATH; - this.hierarchyServiceUrl = GEONAMES_ORG_WEBSERVICE_URL+HIERARCHY_SERVICE_PATH; + this.searchServiceUrl = searchService != null?searchService:(GEONAMES_ORG_WEBSERVICE_URL+SEARCH_SERVICE_PATH); + this.hierarchyServiceUrl = hierarchyService != null ?hierarchyService:(GEONAMES_ORG_WEBSERVICE_URL+HIERARCHY_SERVICE_PATH); this.userName = userName == null || userName.isEmpty()?null:userName; this.token = this.userName == null ? null:token; } @@ -352,10 +352,18 @@ public class GeonamesAPIWrapper { URL requestUrl; try { requestUrl = new URL(requestString.toString()); + log.info(" > search request: "+requestUrl); } catch (MalformedURLException e) { throw new IllegalStateException("Unable to build valid request URL for " + requestString); } + long start = System.currentTimeMillis(); String result = IOUtils.toString(requestUrl.openConnection().getInputStream()); + long responseTime = System.currentTimeMillis()-start; + if(responseTime > 1000){ + log.info(" - responseTime: "+responseTime+"ms"); + } else { + log.debug(" - responseTime: "+responseTime+"ms"); + } try { JSONObject root = new JSONObject(result); if (root.has("totalResultsCount")) { @@ -398,36 +406,38 @@ public class GeonamesAPIWrapper { public List<Toponym> getHierarchy(int geonameId) throws IOException { StringBuilder requestString = new StringBuilder(); requestString.append(hierarchyServiceUrl); + Map<HierarchyRequestPorpertyEnum,Collection<String>> requestProperties = + new EnumMap<HierarchyRequestPorpertyEnum,Collection<String>>(HierarchyRequestPorpertyEnum.class); + requestProperties.put(HierarchyRequestPorpertyEnum.geonameId, Collections.singleton(Integer.toString(geonameId))); + if(userName != null){ + requestProperties.put(HierarchyRequestPorpertyEnum.username, Collections.singleton(userName)); + //add the token only if also the user name was added + // ... we would not like to use the token of an other user name + if(token != null){ + requestProperties.put(HierarchyRequestPorpertyEnum.token, Collections.singleton(token)); + } + } boolean first = true; for (HierarchyRequestPorpertyEnum entry : HierarchyRequestPorpertyEnum.values()) { - Collection<String> values; - switch (entry) { //add values for geonameId, username and token - case geonameId: - values = Collections.singleton(Integer.toString(geonameId)); - break; - case username: - if(userName != null){ - values = Collections.singleton(userName); - } - case token: - if(token != null){ - values = Collections.singleton(token); - } - default: - values = null; - break; - } - if (entry.getProperty().encode(requestString, first, values) && first) { + if (entry.getProperty().encode(requestString, first, requestProperties.get(entry)) && first) { first = false; // if the first parameter is added set first to false } } URL requestUrl; try { requestUrl = new URL(requestString.toString()); + log.info(" > hierarchy request: "+requestUrl); } catch (MalformedURLException e) { throw new IllegalStateException("Unable to build valid request URL for " + requestString); } + long start = System.currentTimeMillis(); String result = IOUtils.toString(requestUrl.openConnection().getInputStream()); + long responseTime = System.currentTimeMillis()-start; + if(responseTime > 1000){ + log.info(" - responseTime: "+responseTime+"ms"); + } else { + log.debug(" - responseTime: "+responseTime+"ms"); + } try { JSONObject root = new JSONObject(result); if (root.has("geonames")) { Modified: incubator/stanbol/trunk/enhancer/engines/geonames/src/main/java/org/apache/stanbol/enhancer/engines/geonames/impl/LocationEnhancementEngine.java URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/geonames/src/main/java/org/apache/stanbol/enhancer/engines/geonames/impl/LocationEnhancementEngine.java?rev=1073335&r1=1073334&r2=1073335&view=diff ============================================================================== --- incubator/stanbol/trunk/enhancer/engines/geonames/src/main/java/org/apache/stanbol/enhancer/engines/geonames/impl/LocationEnhancementEngine.java (original) +++ incubator/stanbol/trunk/enhancer/engines/geonames/src/main/java/org/apache/stanbol/enhancer/engines/geonames/impl/LocationEnhancementEngine.java Tue Feb 22 14:16:41 2011 @@ -99,7 +99,7 @@ public class LocationEnhancementEngine i * Default values for the number of results returned by search requests * to the geonames.org web service */ - private static final int DEFAULT_MAX_LOCATION_ENHANCEMENTS = 5; + private static final int DEFAULT_MAX_LOCATION_ENHANCEMENTS = 3; @Property(intValue=DEFAULT_MAX_LOCATION_ENHANCEMENTS) public static final String MAX_LOCATION_ENHANCEMENTS = "org.apache.stanbol.enhancer.engines.geonames.locationEnhancementEngine.max-location-enhancements"; @@ -241,6 +241,8 @@ public class LocationEnhancementEngine i } String userName = (String)properties.get(GEONAMES_USERNAME); String token = (String)properties.get(GEONAMES_TOKEN); + log.info(String.format("create Geonames Client for server: %s and user: %s (token not logged)", + serverUrl,userName)); geonamesService = new GeonamesAPIWrapper(serverUrl, userName, token); } Modified: incubator/stanbol/trunk/enhancer/engines/geonames/src/main/resources/OSGI-INF/metatype/metatype.properties URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/geonames/src/main/resources/OSGI-INF/metatype/metatype.properties?rev=1073335&r1=1073334&r2=1073335&view=diff ============================================================================== --- incubator/stanbol/trunk/enhancer/engines/geonames/src/main/resources/OSGI-INF/metatype/metatype.properties (original) +++ incubator/stanbol/trunk/enhancer/engines/geonames/src/main/resources/OSGI-INF/metatype/metatype.properties Tue Feb 22 14:16:41 2011 @@ -12,10 +12,10 @@ org.apache.stanbol.enhancer.engines.geon org.apache.stanbol.enhancer.engines.geonames.locationEnhancementEngine.min-hierarchy-score.description=The minimum score of a location so that also the hierarchy (administrative region, country, continent) are requested and - if found - added as Entity Enhancements. org.apache.stanbol.enhancer.engines.geonames.locationEnhancementEngine.serverURL.name= geonames.org Server -org.apache.stanbol.enhancer.engines.geonames.locationEnhancementEngine.serverURL.description=The URL of the geonames.org server to use. An empty configuration will use the free geonames.org server. Users with a premium account that includes an own sub domain need to change this configuration. +org.apache.stanbol.enhancer.engines.geonames.locationEnhancementEngine.serverURL.description=The URL of the geonames.org server to use. Defaults to "http://ws.geonames.org". This server does not require user authentication. There is an other free server at "http://api.geonames.org" that requires to set up AND ACTIVATE a free user account. Users with a premium account may also need to change the value of this field. org.apache.stanbol.enhancer.engines.geonames.locationEnhancementEngine.username.name=User Name -org.apache.stanbol.enhancer.engines.geonames.locationEnhancementEngine.username.description=To use the free service one does not need an user name. Users that own a Premium Account need to configure this property. +org.apache.stanbol.enhancer.engines.geonames.locationEnhancementEngine.username.description=Required for all servers other than "http://ws.geonames.org". Typically this is identical to the user name of the geonames.org account name. org.apache.stanbol.enhancer.engines.geonames.locationEnhancementEngine.token.name=Token -org.apache.stanbol.enhancer.engines.geonames.locationEnhancementEngine.token.description=To use the free service one does not need a token. Users that own a Premium Account need to configure this property. This property will be ignored if no user name is configured. +org.apache.stanbol.enhancer.engines.geonames.locationEnhancementEngine.token.description=Required for all servers other than "http://ws.geonames.org". Typically this is the password of the geonames.org account.
