Author: wkasper
Date: Tue Jul 31 08:22:51 2012
New Revision: 1367455

URL: http://svn.apache.org/viewvc?rev=1367455&view=rev
Log:
Stanbol-707: New Language Identification Engine

Added:
    incubator/stanbol/trunk/enhancer/engines/langdetect/
    incubator/stanbol/trunk/enhancer/engines/langdetect/README.md
    incubator/stanbol/trunk/enhancer/engines/langdetect/pom.xml
    incubator/stanbol/trunk/enhancer/engines/langdetect/src/
    incubator/stanbol/trunk/enhancer/engines/langdetect/src/license/
    
incubator/stanbol/trunk/enhancer/engines/langdetect/src/license/THIRD-PARTY.properties
    incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/
    incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/java/
    incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/java/org/
    
incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/java/org/apache/
    
incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/java/org/apache/stanbol/
    
incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/java/org/apache/stanbol/enhancer/
    
incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/java/org/apache/stanbol/enhancer/engines/
    
incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/java/org/apache/stanbol/enhancer/engines/langdetect/
    
incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/java/org/apache/stanbol/enhancer/engines/langdetect/LanguageDetectionEnhancementEngine.java
    
incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/java/org/apache/stanbol/enhancer/engines/langdetect/LanguageIdentifier.java
    incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/resources/
    
incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/resources/OSGI-INF/
    
incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/resources/OSGI-INF/metatype/
    
incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/resources/OSGI-INF/metatype/metatype.properties
    
incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/resources/profiles.cfg
    incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/
    incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/java/
    incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/java/org/
    
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/java/org/apache/
    
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/java/org/apache/stanbol/
    
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/java/org/apache/stanbol/enhancer/
    
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/java/org/apache/stanbol/enhancer/engines/
    
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/java/org/apache/stanbol/enhancer/engines/langdetect/
    
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/java/org/apache/stanbol/enhancer/engines/langdetect/LanguageDetectionEngineTest.java
    
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/java/org/apache/stanbol/enhancer/engines/langdetect/MockComponentContext.java
    incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/resources/
    
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/resources/README
    
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/resources/en.txt
    
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/resources/ja.txt   
(with props)
    
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/resources/ko.txt
    
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/resources/zh.txt   
(with props)

Added: incubator/stanbol/trunk/enhancer/engines/langdetect/README.md
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/langdetect/README.md?rev=1367455&view=auto
==============================================================================
--- incubator/stanbol/trunk/enhancer/engines/langdetect/README.md (added)
+++ incubator/stanbol/trunk/enhancer/engines/langdetect/README.md Tue Jul 31 
08:22:51 2012
@@ -0,0 +1,125 @@
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+# LangDetect: Language Identification Enhancement Engine
+
+The **LanguageDetection** engine determines the language of text. 
+
+## Technical Description
+
+The provided engine is based on the [language detection 
library](http://code.google.com/p/language-detection/).
+The text to be checked must be provided in plain text format by the content 
item.
+
+The result of language identification is added as TextAnnotation to the 
content item's metadata as string value of the property
+
+    http://purl.org/dc/terms/language
+
+This RDF snippet illustrates the output:
+
+    <fise:TextAnnotation 
rdf:about="urn:enhancement-a147957b-41f9-58f7-bbf1-b880b3aa4b49">
+        <dc:language>en</dc:language>
+        
<dc:creator>org.apache.stanbol.enhancer.engines.langdetect.LanguageDetectionEnhancementEngine</dc:creator>
+    </fise:TextAnnotation>
+
+
+By default the language identifier distinguishes [53 
languages](http://code.google.com/p/language-detection/wiki/LanguageList) 
listed here:
+
+* af:  Afrikaans
+* ar:  Arabic
+* bg:  Bulgarian
+* bn:  Bengali
+* cs:  Czech
+* da:  Dannish
+* de:  German
+* el:  Greek
+* en:  English
+* es:  Spanish
+* et:  Estonian
+* fa: Persian
+* fi: Finnish
+* fr: French
+* gu: Gujarati
+* he: Hebrew
+* hi: Hindi
+* hr: Croatian
+* hu: Hungarian
+* id: Indonesian
+* it: Italian
+* ja: Japanese
+* kn: Kannada
+* ko: Korean
+* lt: Lithuanian
+* lv: Latvian
+* mk: Macedonian
+* ml: Malayalam
+* mr: Marathi
+* ne: Nepali
+* nl: Dutch
+* no: Norwegian
+* pa: Punjabi
+* pl: Polish
+* pt: Portuguese
+* ro: Romanian
+* ru: Russian
+* sk: Slovak
+* sl: Slovene
+* so: Somali
+* sq: Albanian
+* sv: Swedish
+* sw: Swahili
+* ta: Tamil
+* te: Telugu
+* th: Thai
+* tl: Tagalog
+* tr: Turkish
+* uk: Ukrainian
+* ur: Urdu
+* vi: Vietnamese
+* zh-cn:       Simplified Chinese
+* zh-tw:       Traditional Chinese
+
+Additional language models can be created by the 
[tools](http://code.google.com/p/language-detection/wiki/Tools).
+
+## Configuration options
+
+* 
<pre><code>org.apache.stanbol.enhancer.engines.langdetect.probe-length</pre></code>
+
+    an integer specifying how many characters will be used for
+    identification. A value of 0 or below means to use the complete
+    text. Otherwise only a substring of the specified length taken from the
+    middle of the text will be used. The default value is 400 characters.
+
+## Usage
+
+Assuming that the Stanbol endpoint with the full launcher is running at
+
+    http://localhost:8080
+
+and the engine is activated, from the command line commands like this
+can be used for submitting some text file as content item:
+
+* stateless interface
+
+    curl -i -X POST -H "Content-Type:text/plain" -T testfile.txt 
http://localhost:8080/engines
+
+* stateful interface
+
+    curl -i -X PUT -H "Content-Type:text/plain" -T testfile.txt 
http://localhost:8080/contenthub/content/someFileId
+
+Alternatively, the Stanbol web interface can be used for submitting documents
+and viewing the metadata at
+
+    http://localhost:8080/contenthub
+

Added: incubator/stanbol/trunk/enhancer/engines/langdetect/pom.xml
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/langdetect/pom.xml?rev=1367455&view=auto
==============================================================================
--- incubator/stanbol/trunk/enhancer/engines/langdetect/pom.xml (added)
+++ incubator/stanbol/trunk/enhancer/engines/langdetect/pom.xml Tue Jul 31 
08:22:51 2012
@@ -0,0 +1,140 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+        http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+<project xmlns="http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd";>
+
+  <modelVersion>4.0.0</modelVersion>
+
+  <parent>
+    <artifactId>org.apache.stanbol.enhancer.parent</artifactId>
+    <groupId>org.apache.stanbol</groupId>
+    <version>0.10.0-incubating-SNAPSHOT</version>
+    <relativePath>../../parent</relativePath>
+  </parent>
+
+  <groupId>org.apache.stanbol</groupId>
+  <artifactId>org.apache.stanbol.enhancer.engines.langdetect</artifactId>
+  <version>0.10.0-incubating-SNAPSHOT</version>
+  <packaging>bundle</packaging>
+
+  <name>Apache Stanbol Enhancer Enhancement Engine : Language Identifier</name>
+  <description>language detection for 53 languages based on 
http://code.google.com/p/language-detection
+  </description>
+
+  <inceptionYear>2012</inceptionYear>
+
+  <scm>
+    <connection>
+      
scm:svn:http://svn.apache.org/repos/asf/incubator/stanbol/trunk/enhancer/engines/langdetect/
+    </connection>
+    <developerConnection>
+      
scm:svn:https://svn.apache.org/repos/asf/incubator/stanbol/trunk/enhancer/engines/langdetect/
+    </developerConnection>
+    <url>http://incubator.apache.org/stanbol/</url>
+  </scm>
+
+       <build>
+               <plugins>
+                       <plugin>
+                               <groupId>org.apache.felix</groupId>
+                               <artifactId>maven-bundle-plugin</artifactId>
+                               <extensions>true</extensions>
+                               <configuration>
+                                       <instructions>
+                                               <Export-Package>
+                                                       
org.apache.stanbol.enhancer.engines.langdetect;version=${project.version}
+                                               </Export-Package>
+                                               <Embed-Dependency>
+                                                       langdetect;scope=compile
+                                               </Embed-Dependency>
+                                               
<Embed-Transitive>true</Embed-Transitive>
+                                       </instructions>
+                               </configuration>
+                       </plugin>
+                       <plugin>
+                               <groupId>org.apache.felix</groupId>
+                               <artifactId>maven-scr-plugin</artifactId>
+                       </plugin>
+                       <plugin>
+                               <groupId>org.apache.rat</groupId>
+                               <artifactId>apache-rat-plugin</artifactId>
+                               <configuration>
+                                       <excludes>
+                                               <!-- AL20 licensed files: See 
src/test/resources/README -->
+                                               
<exclude>src/test/resources/*.txt</exclude>
+                                       </excludes>
+                               </configuration>
+                       </plugin>
+               </plugins>
+       </build>
+
+  <dependencies>
+    <dependency>
+      <groupId>org.apache.stanbol</groupId>
+      <artifactId>org.apache.stanbol.enhancer.servicesapi</artifactId>
+      <version>0.10.0-incubating-SNAPSHOT</version>
+    </dependency>
+
+    <dependency>
+      <groupId>com.cybozu.labs</groupId>
+      <artifactId>langdetect</artifactId>
+      <version>1.1-20120112</version>
+    </dependency>
+
+    <dependency>
+      <groupId>org.apache.felix</groupId>
+      <artifactId>org.apache.felix.scr.annotations</artifactId>
+    </dependency>
+    <dependency>
+      <groupId>org.apache.clerezza</groupId>
+      <artifactId>rdf.core</artifactId>
+    </dependency>
+    <dependency>
+      <groupId>commons-io</groupId>
+      <artifactId>commons-io</artifactId>
+    </dependency>
+    <dependency>
+      <groupId>org.slf4j</groupId>
+      <artifactId>slf4j-api</artifactId>
+    </dependency>
+
+    <dependency>
+      <groupId>org.apache.stanbol</groupId>
+      <artifactId>org.apache.stanbol.enhancer.test</artifactId>
+      <version>0.10.0-incubating-SNAPSHOT</version>
+      <scope>test</scope>
+    </dependency>
+    <dependency>
+      <groupId>org.apache.stanbol</groupId>
+      <artifactId>org.apache.stanbol.enhancer.core</artifactId>
+      <version>0.10.0-incubating-SNAPSHOT</version>
+      <scope>test</scope>
+    </dependency>    
+       <dependency>
+      <groupId>org.slf4j</groupId>
+      <artifactId>slf4j-simple</artifactId>
+      <scope>test</scope>
+    </dependency>
+    <dependency>
+      <groupId>junit</groupId>
+      <artifactId>junit</artifactId>
+      <scope>test</scope>
+    </dependency>
+  </dependencies>
+
+</project>

Added: 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/license/THIRD-PARTY.properties
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/langdetect/src/license/THIRD-PARTY.properties?rev=1367455&view=auto
==============================================================================
--- 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/license/THIRD-PARTY.properties
 (added)
+++ 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/license/THIRD-PARTY.properties
 Tue Jul 31 08:22:51 2012
@@ -0,0 +1,24 @@
+# Generated by org.codehaus.mojo.license.AddThirdPartyMojo
+#-------------------------------------------------------------------------------
+# Already used licenses in project :
+# - Apache Software License
+# - Apache Software License, Version 2.0
+# - BSD License
+# - Common Development And Distribution License (CDDL), Version 1.0
+# - Common Development And Distribution License (CDDL), Version 1.1
+# - Common Public License, Version 1.0
+# - Eclipse Public License, Version 1.0
+# - GNU General Public License (GPL), Version 2 with classpath exception
+# - GNU Lesser General Public License (LGPL)
+# - GNU Lesser General Public License (LGPL), Version 2.1
+# - ICU License
+# - MIT License
+# - Public Domain License
+#-------------------------------------------------------------------------------
+# Please fill the missing licenses for dependencies :
+#
+#
+#Mon Jul 30 15:41:25 CEST 2012
+javax.servlet--servlet-api--2.5=Common Development And Distribution License 
(CDDL), Version 1.0
+org.osgi--org.osgi.compendium--4.1.0=The Apache Software License, Version 2.0
+org.osgi--org.osgi.core--4.1.0=The Apache Software License, Version 2.0

Added: 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/java/org/apache/stanbol/enhancer/engines/langdetect/LanguageDetectionEnhancementEngine.java
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/java/org/apache/stanbol/enhancer/engines/langdetect/LanguageDetectionEnhancementEngine.java?rev=1367455&view=auto
==============================================================================
--- 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/java/org/apache/stanbol/enhancer/engines/langdetect/LanguageDetectionEnhancementEngine.java
 (added)
+++ 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/java/org/apache/stanbol/enhancer/engines/langdetect/LanguageDetectionEnhancementEngine.java
 Tue Jul 31 08:22:51 2012
@@ -0,0 +1,232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.stanbol.enhancer.engines.langdetect;
+
+import static 
org.apache.stanbol.enhancer.servicesapi.rdf.Properties.DC_LANGUAGE;
+import static org.apache.stanbol.enhancer.servicesapi.rdf.Properties.DC_TYPE;
+import static 
org.apache.stanbol.enhancer.servicesapi.rdf.TechnicalClasses.DCTERMS_LINGUISTIC_SYSTEM;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.Dictionary;
+import java.util.List;
+import java.util.Map;
+import java.util.Map.Entry;
+import java.util.Set;
+
+import org.apache.clerezza.rdf.core.MGraph;
+import org.apache.clerezza.rdf.core.UriRef;
+import org.apache.clerezza.rdf.core.impl.PlainLiteralImpl;
+import org.apache.clerezza.rdf.core.impl.TripleImpl;
+import org.apache.commons.io.IOUtils;
+import org.apache.felix.scr.annotations.Component;
+import org.apache.felix.scr.annotations.Properties;
+import org.apache.felix.scr.annotations.Property;
+import org.apache.felix.scr.annotations.Service;
+import org.apache.stanbol.enhancer.servicesapi.Blob;
+import org.apache.stanbol.enhancer.servicesapi.Chain;
+import org.apache.stanbol.enhancer.servicesapi.ContentItem;
+import org.apache.stanbol.enhancer.servicesapi.EngineException;
+import org.apache.stanbol.enhancer.servicesapi.EnhancementEngine;
+import org.apache.stanbol.enhancer.servicesapi.InvalidContentException;
+import org.apache.stanbol.enhancer.servicesapi.ServiceProperties;
+import org.apache.stanbol.enhancer.servicesapi.helper.ContentItemHelper;
+import org.apache.stanbol.enhancer.servicesapi.helper.EnhancementEngineHelper;
+import org.apache.stanbol.enhancer.servicesapi.impl.AbstractEnhancementEngine;
+import org.osgi.service.cm.ConfigurationException;
+import org.osgi.service.component.ComponentContext;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.cybozu.labs.langdetect.LangDetectException;
+
+/**
+ * {@link LanguageDetectionEnhancementEngine} provides functionality to 
enhance document
+ * with their language.
+ *
+ * @author Walter Kasper, DFKI
+ */
+@Component(immediate = true, metatype = true, inherit=true)
+@Service
+@Properties(value={
+    @Property(name=EnhancementEngine.PROPERTY_NAME,value="langdetect")
+})
+public class LanguageDetectionEnhancementEngine 
+        extends AbstractEnhancementEngine<LangDetectException,RuntimeException>
+        implements EnhancementEngine, ServiceProperties {
+
+    /**
+     * a configurable value of the text segment length to check
+     */
+    @Property
+    public static final String PROBE_LENGTH_PROP = 
"org.apache.stanbol.enhancer.engines.langdetect.probe-length";
+
+
+    /**
+     * The default value for the Execution of this Engine. Currently set to
+     * {@link ServiceProperties#ORDERING_PRE_PROCESSING} - 2<p>
+     * NOTE: this information is used by the default and weighed {@link Chain}
+     * implementation to determine the processing order of 
+     * {@link EnhancementEngine}s. Other {@link Chain} implementation do not
+     * use this information.
+     */
+    public static final Integer defaultOrder = ORDERING_PRE_PROCESSING - 2;
+
+    /**
+     * This contains the only MIME type directly supported by this enhancement 
engine.
+     */
+    private static final String TEXT_PLAIN_MIMETYPE = "text/plain";
+    /**
+     * Set containing the only supported mime type {@link #TEXT_PLAIN_MIMETYPE}
+     */
+    private static final Set<String> SUPPORTED_MIMTYPES = 
Collections.singleton(TEXT_PLAIN_MIMETYPE);
+
+    /**
+     * This contains the logger.
+     */
+    private static final Logger log = 
LoggerFactory.getLogger(LanguageDetectionEnhancementEngine.class);
+
+    private static final int PROBE_LENGTH_DEFAULT = 1000;
+
+    /**
+     * How much text should be used for testing: If the value is 0 or smaller,
+     * the complete text will be used. Otherwise a text probe of the given 
length
+     * is taken from the middle of the text. The default length is 1000.
+     */
+    private int probeLength = PROBE_LENGTH_DEFAULT;
+    
+    private LanguageIdentifier languageIdentifier;
+    
+    /**
+     * Initialize the language identifier model and load the prop length bound 
if
+     * provided as a property.
+     * 
+     * @param ce
+     *            the {@link ComponentContext}
+     */
+    protected void activate(ComponentContext ce) throws 
ConfigurationException, LangDetectException {
+        super.activate(ce);
+        if (ce != null) {
+            @SuppressWarnings("unchecked")
+            Dictionary<String, String> properties = ce.getProperties();
+            String lengthVal = properties.get(PROBE_LENGTH_PROP);
+            probeLength = lengthVal == null ? PROBE_LENGTH_DEFAULT : 
Integer.parseInt(lengthVal);
+        }
+        languageIdentifier = new LanguageIdentifier();
+    }
+    
+    protected void deactivate(ComponentContext ce) {
+        super.deactivate(ce);
+        this.languageIdentifier = null;
+    }
+
+    public int canEnhance(ContentItem ci) throws EngineException {
+        if(ContentItemHelper.getBlob(ci, SUPPORTED_MIMTYPES) != null){
+            return ENHANCE_ASYNC; //Langid now supports async processing
+        } else {
+            return CANNOT_ENHANCE;
+        }
+    }
+
+    public void computeEnhancements(ContentItem ci) throws EngineException {
+        Entry<UriRef,Blob> contentPart = ContentItemHelper.getBlob(ci, 
SUPPORTED_MIMTYPES);
+        if(contentPart == null){
+            throw new IllegalStateException("No ContentPart with Mimetype '"
+                    + TEXT_PLAIN_MIMETYPE+"' found for ContentItem 
"+ci.getUri()
+                    + ": This is also checked in the canEnhance method! -> 
This "
+                    + "indicated an Bug in the implementation of the "
+                    + "EnhancementJobManager!");
+        }
+        String text = "";
+        try {
+            text = ContentItemHelper.getText(contentPart.getValue());
+        } catch (IOException e) {
+            throw new InvalidContentException(this, ci, e);
+        }
+        if (text.trim().length() == 0) {
+            log.info("No text contained in ContentPart {} of ContentItem {}",
+                contentPart.getKey(),ci.getUri());
+            return;
+        }
+
+        // truncate text to some piece from the middle if probeLength > 0
+        int checkLength = probeLength;
+        if (checkLength > 0 && text.length() > checkLength) {
+            text = text.substring(text.length() / 2 - checkLength / 2, 
text.length() / 2 + checkLength / 2);
+        }
+        String language = null;
+        try {
+            language = languageIdentifier.getLanguage(text);
+            log.info("language identified as " + language);
+        }
+        catch (LangDetectException e) {
+            log.warn("Could not identify language");
+            return;
+        }
+        
+        // add language to metadata
+        MGraph g = ci.getMetadata();
+        ci.getLock().writeLock().lock();
+        try {
+            UriRef textEnhancement = 
EnhancementEngineHelper.createTextEnhancement(ci, this);
+            g.add(new TripleImpl(textEnhancement, DC_LANGUAGE, new 
PlainLiteralImpl(language)));
+            g.add(new TripleImpl(textEnhancement, DC_TYPE, 
DCTERMS_LINGUISTIC_SYSTEM));
+        } finally {
+            ci.getLock().writeLock().unlock();
+        }
+    }
+
+    public List<String> loadProfiles(String folder, String configFile) throws 
Exception {
+        List<String> profiles = new ArrayList<String>();
+        java.util.Properties props = new java.util.Properties();
+        
props.load(getClass().getClassLoader().getResourceAsStream(configFile));
+        String languages = props.getProperty("languages");
+        if (languages == null) {
+            throw new IOException("No languages defined");
+        }
+        for (String lang: languages.split(",")) {
+            String profileFile = folder+"/"+lang;
+            InputStream is = 
getClass().getClassLoader().getResourceAsStream(profileFile);
+            String profile;
+            try {
+                profile = IOUtils.toString(is, "UTF-8");
+                if (profile != null && profile.length() > 0) {
+                    profiles.add(profile);
+                }
+                is.close();
+            } catch (IOException e) {
+                e.printStackTrace();
+            }
+        }
+        return profiles;
+    }
+    
+    public int getProbeLength() {
+        return probeLength;
+    }
+
+    public void setProbeLength(int probeLength) {
+        this.probeLength = probeLength;
+    }
+
+    public Map<String, Object> getServiceProperties() {
+        return 
Collections.unmodifiableMap(Collections.singletonMap(ENHANCEMENT_ENGINE_ORDERING,
 (Object) defaultOrder));
+    }
+
+}

Added: 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/java/org/apache/stanbol/enhancer/engines/langdetect/LanguageIdentifier.java
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/java/org/apache/stanbol/enhancer/engines/langdetect/LanguageIdentifier.java?rev=1367455&view=auto
==============================================================================
--- 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/java/org/apache/stanbol/enhancer/engines/langdetect/LanguageIdentifier.java
 (added)
+++ 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/java/org/apache/stanbol/enhancer/engines/langdetect/LanguageIdentifier.java
 Tue Jul 31 08:22:51 2012
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.stanbol.enhancer.engines.langdetect;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.commons.io.IOUtils;
+
+import com.cybozu.labs.langdetect.Detector;
+import com.cybozu.labs.langdetect.DetectorFactory;
+import com.cybozu.labs.langdetect.LangDetectException;
+
+/**
+ * Standalone version of the Language Identifier
+ * @author <a href="mailto:[email protected]";>Walter Kasper</a>
+ * 
+ */
+
+public class LanguageIdentifier {
+    
+    public LanguageIdentifier() throws LangDetectException {
+        DetectorFactory.clear();
+        try {
+            
DetectorFactory.loadProfile(loadProfiles("profiles","profiles.cfg"));
+        } catch (Exception e) {
+            throw new LangDetectException(null, "Error in Initialization: 
"+e.getMessage());
+        } 
+    }
+    /**
+     * Load the profiles from the classpath
+     * @param folder where the profiles are
+     * @param configFile specifies which language profiles should be used
+     * @return a list of profiles
+     * @throws Exception
+     */
+    public List<String> loadProfiles(String folder, String configFile) throws 
Exception {
+        List<String> profiles = new ArrayList<String>();
+        java.util.Properties props = new java.util.Properties();
+        
props.load(getClass().getClassLoader().getResourceAsStream(configFile));
+        String languages = props.getProperty("languages");
+        if (languages == null) {
+            throw new IOException("No languages defined");
+        }
+        for (String lang: languages.split(",")) {
+            String profileFile = folder+"/"+lang;
+            InputStream is = 
getClass().getClassLoader().getResourceAsStream(profileFile);
+            try {
+                String profile = IOUtils.toString(is, "UTF-8");
+                if (profile != null && profile.length() > 0) {
+                    profiles.add(profile);
+                }
+                is.close();
+            } catch (IOException e) {
+                e.printStackTrace();
+            }
+        }
+        return profiles;
+    }
+    
+    public String getLanguage(String text) throws LangDetectException {
+        Detector detector = DetectorFactory.create();
+        detector.append(text);
+        return detector.detect();
+    }
+
+}

Added: 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/resources/OSGI-INF/metatype/metatype.properties
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/resources/OSGI-INF/metatype/metatype.properties?rev=1367455&view=auto
==============================================================================
--- 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/resources/OSGI-INF/metatype/metatype.properties
 (added)
+++ 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/resources/OSGI-INF/metatype/metatype.properties
 Tue Jul 31 08:22:51 2012
@@ -0,0 +1,32 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+stanbol.enhancer.engine.name.name=Name
+stanbol.enhancer.engine.name.description=The name of the enhancement engine as 
\
+used in the RESTful interface '/engine/<name>'
+
+service.ranking.name=Ranking
+service.ranking.description=If two enhancement engines with the same name are 
active the \
+one with the higher ranking will be used to process parsed content items.
+
+#===============================================================================
+#Properties and Options used to configure LangIdEnhancementEngine
+#===============================================================================
+
+org.apache.stanbol.enhancer.engines.langdetect.LanguageDetectionEnhancementEngine.name=Apache
 Stanbol \
+Enhancer Engine: Language Identification
+org.apache.stanbol.enhancer.engines.langdetect.LanguageDetectionEnhancementEngine.description=Detects
 \
+the Language for parsed Text.

Added: 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/resources/profiles.cfg
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/resources/profiles.cfg?rev=1367455&view=auto
==============================================================================
--- 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/resources/profiles.cfg
 (added)
+++ 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/main/resources/profiles.cfg
 Tue Jul 31 08:22:51 2012
@@ -0,0 +1,25 @@
+#
+#  Licensed to the Apache Software Foundation (ASF) under one or more
+#  contributor license agreements.  See the NOTICE file distributed with
+#  this work for additional information regarding copyright ownership.
+#  The ASF licenses this file to You under the Apache License, Version 2.0
+#  (the "License"); you may not use this file except in compliance with
+#  the License.  You may obtain a copy of the License at
+# 
+#      http://www.apache.org/licenses/LICENSE-2.0
+# 
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+#
+# This is a tika LanguageIdentifier properties file.
+# Its name is org/apache/tika/language/tika.language.properties
+# You can override it by placing a copy on the classpath in a file called
+# org/apache/tika/language/tika.language.override.properties
+
+# List of languages for which there are <language>.ngp profiles
+# If there exists an ISO 639-1 2-letter code it should be used
+# If not, you can choose an ISO 639-2 3-letter code
+languages=af,ar,bg,bn,cs,da,de,el,en,es,et,fa,fi,fr,gu,he,hi,hr,hu,id,it,ja,kn,ko,lt,lv,mk,ml,mr,ne,nl,no,pa,pl,pt,ro,ru,sk,sl,so,sq,sv,sw,ta,te,th,tl,tr,uk,ur,vi,zh-cn,zh-tw

Added: 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/java/org/apache/stanbol/enhancer/engines/langdetect/LanguageDetectionEngineTest.java
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/java/org/apache/stanbol/enhancer/engines/langdetect/LanguageDetectionEngineTest.java?rev=1367455&view=auto
==============================================================================
--- 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/java/org/apache/stanbol/enhancer/engines/langdetect/LanguageDetectionEngineTest.java
 (added)
+++ 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/java/org/apache/stanbol/enhancer/engines/langdetect/LanguageDetectionEngineTest.java
 Tue Jul 31 08:22:51 2012
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.stanbol.enhancer.engines.langdetect;
+
+import static junit.framework.Assert.assertEquals;
+import static 
org.apache.stanbol.enhancer.test.helper.EnhancementStructureHelper.validateAllEntityAnnotations;
+import static 
org.apache.stanbol.enhancer.test.helper.EnhancementStructureHelper.validateAllTextAnnotations;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotNull;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.Arrays;
+import java.util.HashMap;
+
+
+import org.apache.clerezza.rdf.core.LiteralFactory;
+import org.apache.clerezza.rdf.core.Resource;
+import org.apache.clerezza.rdf.core.UriRef;
+import org.apache.commons.io.IOUtils;
+import 
org.apache.stanbol.enhancer.contentitem.inmemory.InMemoryContentItemFactory;
+import org.apache.stanbol.enhancer.servicesapi.ContentItem;
+import org.apache.stanbol.enhancer.servicesapi.ContentItemFactory;
+import org.apache.stanbol.enhancer.servicesapi.EngineException;
+import org.apache.stanbol.enhancer.servicesapi.EnhancementEngine;
+import org.apache.stanbol.enhancer.servicesapi.helper.EnhancementEngineHelper;
+import org.apache.stanbol.enhancer.servicesapi.impl.StringSource;
+import org.apache.stanbol.enhancer.servicesapi.rdf.Properties;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.osgi.service.cm.ConfigurationException;
+import org.osgi.service.component.ComponentContext;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.cybozu.labs.langdetect.Detector;
+import com.cybozu.labs.langdetect.DetectorFactory;
+import com.cybozu.labs.langdetect.LangDetectException;
+
+/**
+ * {@link LanguageDetectionEngineTest} is a test class for {@link 
TextCategorizer}.
+ *
+ * @author Walter Kasper, DFKI
+ */
+public class LanguageDetectionEngineTest {
+    
+    private static final Logger LOG = 
LoggerFactory.getLogger(LanguageDetectionEngineTest.class);
+
+    private static final ContentItemFactory ciFactory = 
InMemoryContentItemFactory.getInstance();
+    
+    private static final String[] TEST_FILE_NAMES = 
{"en.txt","ja.txt","ko.txt","zh.txt"};
+    
+    private static LanguageIdentifier langId;
+    
+    /**
+     * This initializes the text categorizer.
+     * @throws LangDetectException 
+     */
+    @BeforeClass
+    public static void oneTimeSetUp() throws IOException, LangDetectException {
+        langId = new LanguageIdentifier();
+    }
+
+    /**
+     * Tests the language identification.
+     *
+     * @throws IOException if there is an error when reading the text
+     */
+    @Test
+    public void testLangId() throws LangDetectException, IOException {
+        LOG.info("Testing: {}", Arrays.asList(TEST_FILE_NAMES));
+        for (String file: TEST_FILE_NAMES) {
+            String expectedLang = file.substring(0,2);
+            InputStream in = 
LanguageDetectionEngineTest.class.getClassLoader().getResourceAsStream(file);
+            assertNotNull("failed to load resource " + file, in);
+            String text = IOUtils.toString(in, "UTF-8");
+            in.close();
+            String language = langId.getLanguage(text);
+            if (!expectedLang.equals(language.substring(0,2))) {
+                LOG.info("Expected: {}; Found {}",expectedLang,language);
+            }
+            assertEquals(expectedLang, language.substring(0,2));            
+        }
+    }
+    
+    /**
+     * Test the engine and validates the created enhancements
+     * @throws EngineException
+     * @throws IOException
+     * @throws ConfigurationException
+     * @throws LangDetectException 
+     */
+    @Test
+    public void testEngine() throws EngineException, ConfigurationException, 
LangDetectException, IOException {
+        LOG.info("Testing engine: {}", TEST_FILE_NAMES[0]);
+        InputStream in = 
LanguageDetectionEngineTest.class.getClassLoader().getResourceAsStream(TEST_FILE_NAMES[0]);
+        assertNotNull("failed to load resource " + TEST_FILE_NAMES[0], in);
+        String text = IOUtils.toString(in, "UTF-8");
+        in.close();
+        LanguageDetectionEnhancementEngine langIdEngine = new 
LanguageDetectionEnhancementEngine();
+        ComponentContext context =  new MockComponentContext();
+        context.getProperties().put(EnhancementEngine.PROPERTY_NAME, 
"langdetect");
+        langIdEngine.activate(context);
+        ContentItem ci = ciFactory.createContentItem(new StringSource(text));
+        langIdEngine.computeEnhancements(ci);
+        HashMap<UriRef,Resource> expectedValues = new 
HashMap<UriRef,Resource>();
+        expectedValues.put(Properties.ENHANCER_EXTRACTED_FROM, ci.getUri());
+        expectedValues.put(Properties.DC_CREATOR, 
LiteralFactory.getInstance().createTypedLiteral(
+            langIdEngine.getClass().getName()));
+        int textAnnotationCount = validateAllTextAnnotations(ci.getMetadata(), 
text, expectedValues);
+        assertEquals("A single TextAnnotation is expected", 
1,textAnnotationCount);
+        //even through this tests do not validate service quality but rather
+        //the correct integration of the CELI service as EnhancementEngine
+        //we expect the "en" is detected for the parsed text
+        assertEquals("The detected language for text '"+text+"' MUST BE 'en'",
+            "en",EnhancementEngineHelper.getLanguage(ci));
+
+        int entityAnnoNum = validateAllEntityAnnotations(ci.getMetadata(), 
expectedValues);
+        assertEquals("No EntityAnnotations are expected",0, entityAnnoNum);
+
+    }
+}

Added: 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/java/org/apache/stanbol/enhancer/engines/langdetect/MockComponentContext.java
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/java/org/apache/stanbol/enhancer/engines/langdetect/MockComponentContext.java?rev=1367455&view=auto
==============================================================================
--- 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/java/org/apache/stanbol/enhancer/engines/langdetect/MockComponentContext.java
 (added)
+++ 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/java/org/apache/stanbol/enhancer/engines/langdetect/MockComponentContext.java
 Tue Jul 31 08:22:51 2012
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.stanbol.enhancer.engines.langdetect;
+
+import java.util.Dictionary;
+import java.util.Hashtable;
+
+import org.osgi.framework.Bundle;
+import org.osgi.framework.BundleContext;
+import org.osgi.framework.ServiceReference;
+import org.osgi.service.component.ComponentContext;
+import org.osgi.service.component.ComponentInstance;
+
+public class MockComponentContext implements ComponentContext {
+
+    private final Dictionary properties = new Hashtable();
+    
+    @Override
+    public Dictionary getProperties() {
+        return properties;
+    }
+
+    @Override
+    public Object locateService(String name) {
+        return null;
+    }
+
+    @Override
+    public Object locateService(String name, ServiceReference reference) {
+        return null;
+    }
+
+    @Override
+    public Object[] locateServices(String name) {
+        return null;
+    }
+
+    @Override
+    public BundleContext getBundleContext() {
+        return null;
+    }
+
+    @Override
+    public Bundle getUsingBundle() {
+        return null;
+    }
+
+    @Override
+    public ComponentInstance getComponentInstance() {
+        return null;
+    }
+
+    @Override
+    public void enableComponent(String name) {
+    }
+
+    @Override
+    public void disableComponent(String name) {
+    }
+
+    @Override
+    public ServiceReference getServiceReference() {
+        return null;
+    }
+
+}

Added: 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/resources/README
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/resources/README?rev=1367455&view=auto
==============================================================================
--- 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/resources/README 
(added)
+++ 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/resources/README 
Tue Jul 31 08:22:51 2012
@@ -0,0 +1,23 @@
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+The following files are provided under the Apache License, Version 2.0:
+
+en.txt
+zh.txt
+ja.txt
+ko.txt
+
+

Added: 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/resources/en.txt
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/resources/en.txt?rev=1367455&view=auto
==============================================================================
--- 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/resources/en.txt 
(added)
+++ 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/resources/en.txt 
Tue Jul 31 08:22:51 2012
@@ -0,0 +1,9 @@
+The Java platform and language began as an internal project at Sun 
Microsystems in December 1990, providing an alternative to the C++/C 
programming languages. Engineer Patrick Naughton had become increasingly 
frustrated with the state of Sun's C++ and C APIs (application programming 
interfaces) and tools. While considering moving to NeXT, Naughton was offered a 
chance to work on new technology and thus the Stealth Project was started.
+
+The Stealth Project was soon renamed to the Green Project with James Gosling 
and Mike Sheridan joining Naughton. Together with other engineers, they began 
work in a small office on Sand Hill Road in Menlo Park, California. They were 
attempting to develop a new technology for programming next generation smart 
appliances, which Sun expected to be a major new opportunity[4].
+
+The team originally considered using C++, but it was rejected for several 
reasons. Because they were developing an embedded system with limited 
resources, they decided that C++ demanded too large a footprint and that its 
complexity led to developer errors. The language's lack of garbage collection 
meant that programmers had to manually manage system memory, a challenging and 
error-prone task. The team was also troubled by the language's lack of portable 
facilities for security, distributed programming, and threading. Finally, they 
wanted a platform that could be easily ported to all types of devices.
+
+Bill Joy had envisioned a new language combining Mesa and C. In a paper called 
Further, he proposed to Sun that its engineers should produce an 
object-oriented environment based on C++. Initially, Gosling attempted to 
modify and extend C++ (which he referred to as "C++ ++ --") but soon abandoned 
that in favor of creating an entirely new language, which he called Oak, after 
the tree that stood just outside his office.
+
+By the summer of 1992, they were able to demonstrate portions of the new 
platform including the Green OS, the Oak language, the libraries, and the 
hardware. Their first attempt, demonstrated on September 3, 1992, focused on 
building a PDA device named Star7[2] which had a graphical interface and a 
smart agent called "Duke" to assist the user. In November of that year, the 
Green Project was spun off to become firstperson, a wholly owned subsidiary of 
Sun Microsystems, and the team relocated to Palo Alto, California[5]. The 
firstperson team was interested in building highly interactive devices, and 
when Time Warner issued an RFP for a set-top box, firstperson changed their 
target and responded with a proposal for a set-top box platform. However, the 
cable industry felt that their platform gave too much control to the user and 
firstperson lost their bid to SGI. An additional deal with The 3DO Company for 
a set-top box also failed to materialize. Unable to generate interest with
 in the TV industry, the company was rolled back into Sun.
\ No newline at end of file

Added: 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/resources/ja.txt
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/resources/ja.txt?rev=1367455&view=auto
==============================================================================
Binary file - no diff available.

Propchange: 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/resources/ja.txt
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/resources/ko.txt
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/resources/ko.txt?rev=1367455&view=auto
==============================================================================
--- 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/resources/ko.txt 
(added)
+++ 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/resources/ko.txt 
Tue Jul 31 08:22:51 2012
@@ -0,0 +1,5 @@
+구혜선(具惠善[1], 1984년 11월 9일 ~ )은 인천광역시에서 
태어난 대한민국의 인터넷 얼짱 출신 배우이다.[2] 2002년에 
광고로 데뷔했으며, 시트콤 《논스톱5》에 출연했고 
그외에도 《꽃보다 남자》를 비롯한 여러 편의 드라마에 
출연하였다.
+
+2009년 무렵부터 그녀는 책을 내고 그림 전시회를 열며 
단편영화를 제작함으로써 소설가, 일러스트레이터, 
영화감독으로 자신의 활동영역을 넓혀가고 있다.[3][4][2] 
그녀가 출간한 소설 〈탱고〉는 발매 일주일 만에 삼만 
부가 팔렸고,[5] 영화감독 데뷔작인 〈유쾌한 도우미〉는 
부산아시아단편 영화제에서 관객상을 수상했다.[6] 그녀의 
첫 번째 장편 영화 "요술"은 YG 엔터테인먼트가 제작사를 
맡아 2010년
  6월 24일에 개봉되었다.[7]
+
+2003년 서울예술대학 방송연예과에 입학하였으나 방송 활동 
등으로 중퇴하였고,[8] 2010년에 성균관대학교 수시 1차에 
합격하였다.

Added: 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/resources/zh.txt
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/resources/zh.txt?rev=1367455&view=auto
==============================================================================
Binary file - no diff available.

Propchange: 
incubator/stanbol/trunk/enhancer/engines/langdetect/src/test/resources/zh.txt
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream


Reply via email to