Hello Everyone, I'm new to this list, my name is Mihály Héder ; I am the lead developer of Sztakipedia project: http://www.youtube.com/watch?v=8VW0TrvXpl4
Most of Sztakipedia's suggestions are based on UIMA Annoation Chains, that are organized of UIMA Annotation Engines. This are similar stuff to Enhancer Chains and Enhancement Engines, resp. If you are curious, you can play around one of Sztakipedia's chains: http://pedia.sztaki.hu:8080/tfidfengsb/?mode=form This is a Tokenizer+Sentence boundary detector+lemmatizer+tf-idf calculator chain (tf-idf is calculated on enwiki in this case) If you are unfamiliar with it, the main feats of of UIMA are 1) you can find a good number of annotation engines and chains already made, packaged in pear files. 2) the type system stuff and chain building is quite sophisticated and flexible 3) you can annotate not only texts but binaries, images, etc. 4) we have very good experiences with its performance 5) You can always say "This stuff is behind IBM Watson" ;) . One could mention the Asynchron Scaleout functionality but we have not so good experiences with that. So right now I'm investigating how to integrate UIMA stuff into Stanbol. After having read some Stanbol Docs and writing a Hello World enhancement engine to get a grip on Stanbol, I think I this is how it should be done: -An adapter-like interface is needed that glues together two components. If you use UIMA, most of the time you just have a pear file from a third party that you cant/do not want to modify. It will have its own type system, chain definition, etc. Also, hopefully there will be much more Stanbol users than developers in the long run. -This means that the real use case is that the future user downloads a uima chain from somewhere, downloads stanbol, and want to glue the two together without coding in either projects. -However, most of the time it will be non-trivial to turn UIMA Feature Sets to Stanbol Enhancements. In some cases I can imagine that you can just turn every FS to a triple by a simple rule or something, but making this flexible enough from some configuration files seems rather unrealistic for me. So what I have in mind now about UIMA->Enhancement conversion is: -defining a simple java interface with one function, e.g: Triple convertFStoTriple(org.apache.uima.cas.FeatureStructure fs). By implemeting this one function the user could easily define how feature structs are to be turned to Triples. Most of the time this function would give back nulls as there are usually much more UIMA FeatureStructures generated (e.g about two for every word) than the user want to deal with. -creating an Enhancement Engine called UIMAAdapter. This would have a converterClass Service Property that could be configured to contain the name of the class the user just created. This would instantiate the user-written class, provided that its on the classpath, and use it to create enhancements. -for more advanced cases we could provide an interface to map a List<FeatureStructure> to List<Triples>. For even more advanced cases we could provide a convert(List<FeatureStructure>,ContentItem ci) function with full access to the Stanbol ContentItem -naturally we could write some default converter that converts every FeatureStructure that comes out of UIMA to triples in a way for testing purposes and for a basis of extension. The other question is how to communicate with the UIMA Engine. I think the feature of accessing a remotely deployed UIMA engine is a must and the REST interface you can try out on the link above (provided by UIMASimpleServlet) is good for starters. I'm much less sure that embedding everything into a Stanbol Enhancement Engine that is needed to run a UIMA engine is such a good idea, but I think it can be done. What do you think of all the above? p.s. Do you have a "How to write and deploy a Hello World Enhancement Engine tutorial"? I have found the description of the functions to implement, but still it took me a while to figure out how to deploy it to felix, etc. If no, I can write one for you based on my notes. Best, Mihály
