Cool! Thanks for the update, this will help if I ever go all the way with UIMA and Solr.
-sujit On Feb 11, 2013, at 12:13 PM, jazz wrote: > Hi Sujit, > > Thanks for your help! I moved the RoomNumberAnnotator.xml to the top level of > the jar and used the same solrconfig.xml (with the /). Now it works perfect. > > Best regards, Bart > > > On 11 Feb 2013, at 20:13, SUJIT PAL wrote: > >> Hi Bart, >> >> Like I said, I didn't actually hook my UIMA stuff into Solr, content and >> queries are annotated before they reach Solr. What you describe sounds like >> a classpath problem (but of course you already knew that :-)). Since I >> haven't actually done what you are trying to do, here are some suggestions, >> they may or may not work... >> >> 1) package up the XML files into your custom JAR at the top level, that way >> you don't need to specify it as /RoomNumberAnnotator.xml. >> 2) if you are using solr4, then you should drop your custom JAR into >> $SOLR_HOME/collection1/lib, not $SOLR_HOME/lib. >> >> -sujit >> >> On Feb 11, 2013, at 9:40 AM, jazz wrote: >> >>> Hi Sujit and others who answered my question, >>> >>> I have been working on the UIMA path which seems great with the available >>> Eclipse tooling and this: >>> >>> http://sujitpal.blogspot.nl/2011/03/smart-query-parsing-with-uima.html >>> >>> Now I worked through the UIMA tutorial of the RoomNumberAnnotator: >>> http://uima.apache.org/doc-uima-annotator.html >>> And I am able to test it using the UIMA CAS Virtuall Debugger. So far so >>> good. >>> >>> But, now I want to use the new RoomNumberAnnotator with Solr, but it cannot >>> find the xml file and the Java class (they are in the correct lib >>> directories, because the WhitespaceTokenizer works fine). >>> >>> <updateRequestProcessorChain name="uima"> >>> <processor >>> class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory"> >>> <lst name="uimaConfig"> >>> <lst name="runtimeParameters"> >>> </lst> >>> <str name="analysisEngine">/RoomNumberAnnotator.xml</str> >>> <bool name="ignoreErrors">false</bool> >>> <lst name="analyzeFields"> >>> <bool name="merge">false</bool> >>> <arr name="fields"> >>> <str>content</str> >>> </arr> >>> </lst> >>> <lst name="fieldMappings"> >>> <lst name="type"> >>> <str name="name">org.apache.uima.tutorial.RoomNumber</str> >>> <lst name="mapping"> >>> <str name="feature">building</str> >>> <str name="field">UIMAname</str> >>> </lst> >>> </lst> >>> </lst> >>> </lst> >>> </processor> >>> <processor class="solr.LogUpdateProcessorFactory" /> >>> <processor class="solr.RunUpdateProcessorFactory" /> >>> >>> On the Wiki (http://wiki.apache.org/solr/SolrUIMA) this is mentioned but it >>> fails: >>> Deploy new jars inside one of the lib directories >>> >>> Run 'ant clean dist' (or 'mvn clean package') from the solr/contrib/uima >>> path. >>> >>> Is it needed to deploy the new jar (RoomAnnotator.jar)? If yes, which >>> branch can I checkout? This is the Stable release I am running: >>> >>> Solr 4.1.0 1434440 - sarowe - 2013-01-16 17:21:36 >>> >>> Regards, Bart >>> >>> >>> On 8 Feb 2013, at 22:11, SUJIT PAL wrote: >>> >>>> Hi Bart, >>>> >>>> I did some work with UIMA but this was to annotate the data before it goes >>>> to Lucene/Solr, ie not built as a UpdateRequestProcessor. I just looked >>>> through the SolrUima wiki page [http://wiki.apache.org/solr/SolrUIMA] and >>>> I believe you will have to set up your own aggregate analysis chain in >>>> place of the one currently configured. >>>> >>>> Writing UIMA annotators is very simple (there is a tutorial here: >>>> [http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html]). >>>> You provide the XML description for the annotation and let UIMA generate >>>> the annotation bean. You write Java code for the annotator and also the >>>> annotator XML descriptor. UIMA uses the annotator XML descriptor to >>>> instantiate and run your annotator. Overall, sounds really complicated but >>>> its actually quite simple. >>>> >>>> The tutorial has quite a few examples that you will find useful, but in >>>> case you need more, I have some on this github repository: >>>> [https://github.com/sujitpal/tgni/tree/master/src/main/java/com/mycompany/tgni/analysis/uima] >>>> >>>> The dictionary and pattern annotators may be similar to what you are >>>> looking for (date and city annotators). >>>> >>>> Best regards, >>>> Sujit >>>> >>>> On Feb 8, 2013, at 8:50 AM, Bart Rijpers wrote: >>>> >>>>> Hi Alex, >>>>> >>>>> Indeed that is exactly what I am trying to achieve using wordcities. Date >>>>> will be simple: 16-Jan becomes 16-Jan-2013 in a new dynamic field. But >>>>> how do I integrate the Java library as UIMA? The documentation about >>>>> changing schema.xml and solr.xml is not very detailed. >>>>> >>>>> Regards, Bart >>>>> >>>>> On 8 Feb 2013, at 16:57, Alexandre Rafalovitch <arafa...@gmail.com> wrote: >>>>> >>>>>> Hi Bart, >>>>>> >>>>>> I haven't done any UIMA work (I used other stuff for my NLP phase), so >>>>>> not >>>>>> sure I can help much further. But in general, you are venturing into pure >>>>>> research territory here. >>>>>> >>>>>> Even for dates, what do you actually mean? Just fixed expression? >>>>>> Relative >>>>>> dates (e.g. last tuesday?). What about times (7pm?). >>>>>> >>>>>> Same with cities. If you want it offline, you need the gazetteer and >>>>>> disambiguation modules. Gazetteer for cities (worldwide) is huge and has >>>>>> a >>>>>> lot of duplicate names (Paris, Ontario is apparently a short drive from >>>>>> London, Ontario eh?). Something like >>>>>> http://www.maxmind.com/en/worldcities? And disambiguation usually >>>>>> requires training corpus that is similar to >>>>>> what your text will look like. >>>>>> >>>>>> Online services like OpenCalais are backed by gigantic databases and some >>>>>> serious corpus-training Machine Language disambiguation algorithms. >>>>>> >>>>>> So, no plug-and-play solution here. If you really need to get this done, >>>>>> I >>>>>> would recommend narrowing down the specification of exactly what you will >>>>>> settle for and looking for software that can do it. Once you have that, >>>>>> integration with Solr is your next - and smaller - concern. >>>>>> >>>>>> Regards, >>>>>> Alex. >>>>>> >>>>>> Personal blog: http://blog.outerthoughts.com/ >>>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch >>>>>> - Time is the quality of nature that keeps events from happening all at >>>>>> once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) >>>>>> >>>>>> >>>>>> On Fri, Feb 8, 2013 at 10:41 AM, jazz <jazzsa...@me.com> wrote: >>>>>> >>>>>>> Thanks Alex, >>>>>>> >>>>>>> I checked the documentation but it seems there is only a webservice >>>>>>> (OpenCalais) available to extract dates and places. >>>>>>> >>>>>>> http://uima.apache.org/sandbox.html >>>>>>> >>>>>>> Do you know is there is a Solr Compatible UIMA add-on which detects >>>>>>> dates >>>>>>> and places (cities) without a webservice? If not, how do you write one? >>>>>>> >>>>>>> Regards, Bart >>>>>>> >>>>>>> On 8 Feb 2013, at 15:29, Alexandre Rafalovitch wrote: >>>>>>> >>>>>>>> Yes, it is possible. You are looking at UIMA or OpenNLP integration, >>>>>>>> most >>>>>>>> probably in Update Request Processor pipeline. >>>>>>>> >>>>>>>> Have a look here as a start: https://wiki.apache.org/solr/SolrUIMA >>>>>>>> >>>>>>>> You will have to put some serious work into this, it is not all tied >>>>>>>> together and packaged. Mostly because the Natural Language Processing >>>>>>> (the >>>>>>>> field you are getting into) is kind of messy all of its own. >>>>>>>> >>>>>>>> Good luck, >>>>>>>> Alex. >>>>>>>> >>>>>>>> Personal blog: http://blog.outerthoughts.com/ >>>>>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch >>>>>>>> - Time is the quality of nature that keeps events from happening all at >>>>>>>> once. Lately, it doesn't seem to be working. (Anonymous - via GTD >>>>>>>> book) >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Feb 8, 2013 at 9:24 AM, jazz <jazzsa...@me.com> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I want to know if Solr can analyze text and recoginze dates and >>>>>>>>> places. >>>>>>> If >>>>>>>>> yes, is it then possible to create new dynamic fields with these dates >>>>>>> and >>>>>>>>> places (e.g. city). >>>>>>>>> >>>>>>>>> Thanks, Bart >>>>>>>>> >>>>>>> >>>>>>> >>>> >>> >> >