Solr and UIMA, capturing fields

Tom Devel Fri, 13 Feb 2015 12:21:15 -0800

Hi,

I successfully combined Solr and UIMA with the help of
https://wiki.apache.org/solr/SolrUIMA and other pages (and am happy to
provide some help about how to reach this step).


Right now I can run an analysis engine and get some "primitive"
feature/fields which I specify in the schema.xml automatically recognized
by Solr. But if the features itself are objects, I do not know how to
capture them in Solr.

I provided the relevant solrconfig.xml in [1], and the schema.xml addition
in [2] for the following small example, they are using the AE directly
provided by the UIMA example.

With the input "This is a sentence with an email at u...@host.com", Solr
correctly adds the field:

        "UIMAname": [
          "36"
        ]

since this is the index where the email token starts. I could also
successfully capture the feature
<str name="feature">end</str> to indicate where the found email token ends.

However, example.EmailAddress has the features: "begin, end, sofa". sofa is
not a primitive feature, but an "object" which itself has features
"sofaNum, sofaID, sofaString, ..."

How can I access fields in Solr from an annotation like
example.EmailAddress that are not simple strings but itself objects?

I made an image of the CAS Visual Debugger with this AE and the sentence to
show which fields I mean, I hope this makes it more clear:
http://tinypic.com/view.php?pic=34rud1s&s=8#.VN5bF7s2cWN

Does anyone know how to access such fields with Solr and UIMA?

Thanks a lot for any help,
Tom


[1]
  <updateRequestProcessorChain name="uima" default="true">
    <processor
class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
      <lst name="uimaConfig">
        <lst name="runtimeParameters">
        </lst>
        <str
name="analysisEngine">/home/toliwa/javalibs/uimaj-2.6.0-bin/apache-uima/examples/descriptors/analysis_engine/UIMA_Analysis_Example.xml</str>
        <!-- Set to true if you want to continue indexing even if text
processing fails.
             Default is false. That is, Solr throws RuntimeException and
             never indexed documents entirely in your session. -->
        <bool name="ignoreErrors">false</bool>
        <!-- This is optional. It is used for logging when text processing
fails.
             If logField is not specified, uniqueKey will be used as
logField.
        <str name="logField">id</str>
        -->
    <str name="logField">id</str>
        <lst name="analyzeFields">
          <bool name="merge">false</bool>
          <arr name="fields">
            <str>text</str>
          </arr>
        </lst>
        <lst name="fieldMappings">
          <lst name="type">
            <str name="name">example.EmailAddress</str>
            <lst name="mapping">
              <str name="feature">begin</str>
              <str name="field">UIMAname</str>
            </lst>
          </lst>
        </lst>
      </lst>
    </processor>
    <processor class="solr.LogUpdateProcessorFactory" />
    <processor class="solr.RunUpdateProcessorFactory" />
  </updateRequestProcessorChain>

[2]
<field name="UIMAname" type="string" indexed="true" stored="true"
multiValued="true" required="false"/>

Solr and UIMA, capturing fields

Reply via email to