Hi, I successfully combined Solr and UIMA with the help of https://wiki.apache.org/solr/SolrUIMA and other pages (and am happy to provide some help about how to reach this step).
Right now I can run an analysis engine and get some "primitive" feature/fields which I specify in the schema.xml automatically recognized by Solr. But if the features itself are objects, I do not know how to capture them in Solr. I provided the relevant solrconfig.xml in [1], and the schema.xml addition in [2] for the following small example, they are using the AE directly provided by the UIMA example. With the input "This is a sentence with an email at u...@host.com", Solr correctly adds the field: "UIMAname": [ "36" ] since this is the index where the email token starts. I could also successfully capture the feature <str name="feature">end</str> to indicate where the found email token ends. However, example.EmailAddress has the features: "begin, end, sofa". sofa is not a primitive feature, but an "object" which itself has features "sofaNum, sofaID, sofaString, ..." How can I access fields in Solr from an annotation like example.EmailAddress that are not simple strings but itself objects? I made an image of the CAS Visual Debugger with this AE and the sentence to show which fields I mean, I hope this makes it more clear: http://tinypic.com/view.php?pic=34rud1s&s=8#.VN5bF7s2cWN Does anyone know how to access such fields with Solr and UIMA? Thanks a lot for any help, Tom [1] <updateRequestProcessorChain name="uima" default="true"> <processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory"> <lst name="uimaConfig"> <lst name="runtimeParameters"> </lst> <str name="analysisEngine">/home/toliwa/javalibs/uimaj-2.6.0-bin/apache-uima/examples/descriptors/analysis_engine/UIMA_Analysis_Example.xml</str> <!-- Set to true if you want to continue indexing even if text processing fails. Default is false. That is, Solr throws RuntimeException and never indexed documents entirely in your session. --> <bool name="ignoreErrors">false</bool> <!-- This is optional. It is used for logging when text processing fails. If logField is not specified, uniqueKey will be used as logField. <str name="logField">id</str> --> <str name="logField">id</str> <lst name="analyzeFields"> <bool name="merge">false</bool> <arr name="fields"> <str>text</str> </arr> </lst> <lst name="fieldMappings"> <lst name="type"> <str name="name">example.EmailAddress</str> <lst name="mapping"> <str name="feature">begin</str> <str name="field">UIMAname</str> </lst> </lst> </lst> </lst> </processor> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain> [2] <field name="UIMAname" type="string" indexed="true" stored="true" multiValued="true" required="false"/>