Hi Tommaso, Thanks a lot i am able index the content and extract the entities has mentioned by you. I have made the xml content like this
<add> <doc> <field name="reference">Entity.xml</field> <field name="text">Senator Dick Durbin (D-IL) Chicago , March 3,2007.</field> <field name="title">Entity Extraction</field> </doc> </add> and it worked. For benefit of others the procedure which i followed is: Step1: Get these dependency jars AlchemyAPIAnnotator.jar commons-beanutils-1.7.0.jar commons-digester-2.0.jar commons-lang-2.4.jar OpenCalaisAnnotator.jar slf4j-api-1.5.5.jar slf4j-jdk14-1.5.5.jar solr-uima.jar Tagger.jar uima-core.jar WhitespaceTokenizer.jar and source of them are AlchemyAPIAnnotator: http://svn.apache.org/repos/asf/uima/sandbox/trunk/AlchemyAPIAnnotator OpenCalaisAnnotator: http://svn.apache.org/repos/asf/uima/sandbox/trunk/OpenCalaisAnnotator Tagger: http://svn.apache.org/repos/asf/uima/sandbox/trunk/Tagger WhitespaceTokenizer: http://svn.apache.org/repos/asf/uima/sandbox/trunk/WhitespaceTokenizer solr-uima: http://solr-uima.googlecode.com/svn/trunk/solr-uima Step 2: Register in http://www.opencalais.com/apikey & http://www.alchemyapi.com/api/register.html and get the api keys Step 3: as mentioned by Tommaso in http://code.google.com/p/solr-uima/wiki/5MinutesTutorial modify your schema.xml adding the following fields: <field name="language" type="string" indexed="true" stored="true" required="false"/> <field name="concept" type="string" indexed="true" stored="true" multiValued="true" required="false"/> <field name="keyword" type="string" indexed="true" stored="true" multiValued="true" required="false"/> <field name="suggested_category" type="string" indexed="true" stored="true" multiValued="false" required="false"/> <field name="sentence" type="text" indexed="true" stored="true" multiValued="true" required="false" /> <dynamicField name="entity*" type="text" indexed="true" stored="true" /> <field name="text" type="text" indexed="true" stored="true"/> <field name="reference" type="string" indexed="true" stored="true" required="true" /> <field name="title" type="text" indexed="true" stored="true" multiValued="false"/> modify your solrconfig.xml adding the UIMA config with the following : <uimaConfig> <runtimeParameters> <keyword_apikey>VALID_ALCHEMYAPI_KEY</keyword_apikey> <concept_apikey>VALID_ALCHEMYAPI_KEY</concept_apikey> <lang_apikey>VALID_ALCHEMYAPI_KEY</lang_apikey> <cat_apikey>VALID_ALCHEMYAPI_KEY</cat_apikey> <entities_apikey>VALID_ALCHEMYAPI_KEY</entities_apikey> <oc_licenseID>VALID_OPENCALAIS_KEY</oc_licenseID> </runtimeParameters> </uimaConfig> <updateRequestProcessorChain name="uima"> <processor class="org.apache.solr.uima.processor.UIMAProcessorFactory"/> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain> replace your existing default UpdateRequestHandler (<requestHandler name="/update"...) with the following: <requestHandler name="/update" class="solr.XmlUpdateRequestHandler"> <lst name="defaults"> <str name="update.processor">uima</str> </lst> </requestHandler Step 4: Increase the tomcat heap size : set JAVA_OPTS=%JAVA_OPTS% -Xmx256m for windows or JAVA_OPTS=%JAVA_OPTS% -Xmx256m for linux. Step 5: Index using a sample data File name: <add> <doc> <field name="reference">Entity.xml</field> <field name="text">Senator Dick Durbin (D-IL) Chicago , March 3,2007.</field> <field name="title">Entity Extraction</field> </doc> </add> use curl to index curl http://127.0.0.1:8080/solr/update -F solr.bo...@entity.xml followed by a http://127.0.0.1:8080/solr/update?stream.body=<commit/> and you are done. Tommaso, thanks a lot once again for all your support. Please add any steps if i have missed one. Thanks Mahesh -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-UIMA-integration-tp1528253p1646609.html Sent from the Solr - User mailing list archive at Nabble.com.