Hi Bart,

I did some work with UIMA but this was to annotate the data before it goes to 
Lucene/Solr, ie not built as a UpdateRequestProcessor. I just looked through 
the SolrUima wiki page [http://wiki.apache.org/solr/SolrUIMA] and I believe you 
will have to set up your own aggregate analysis chain in place of the one 
currently configured.

Writing UIMA annotators is very simple (there is a tutorial here:  
[http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html]).
 You provide the XML description for the annotation and let UIMA generate the 
annotation bean. You write Java code for the annotator and also the annotator 
XML descriptor. UIMA uses the annotator XML descriptor to instantiate and run 
your annotator. Overall, sounds really complicated but its actually quite 
simple.

The tutorial has quite a few examples that you will find useful, but in case 
you need more, I have some on this github repository:
[https://github.com/sujitpal/tgni/tree/master/src/main/java/com/mycompany/tgni/analysis/uima]

The dictionary and pattern annotators may be similar to what you are looking 
for (date and city annotators).

Best regards,
Sujit

On Feb 8, 2013, at 8:50 AM, Bart Rijpers wrote:

> Hi Alex,
> 
> Indeed that is exactly what I am trying to achieve using wordcities. Date 
> will be simple: 16-Jan becomes 16-Jan-2013 in a new dynamic field. But how do 
> I integrate the Java library as UIMA? The documentation about changing 
> schema.xml and solr.xml is not very detailed. 
> 
> Regards, Bart
> 
> On 8 Feb 2013, at 16:57, Alexandre Rafalovitch <arafa...@gmail.com> wrote:
> 
>> Hi Bart,
>> 
>> I haven't done any UIMA work (I used other stuff for my NLP phase), so not
>> sure I can help much further. But in general, you are venturing into pure
>> research territory here.
>> 
>> Even for dates, what do you actually mean? Just fixed expression? Relative
>> dates (e.g. last tuesday?). What about times (7pm?).
>> 
>> Same with cities. If you want it offline, you need the gazetteer and
>> disambiguation modules. Gazetteer for cities (worldwide) is huge and has a
>> lot of duplicate names (Paris, Ontario is apparently a short drive from
>> London, Ontario eh?). Something like
>> http://www.maxmind.com/en/worldcities? And disambiguation usually
>> requires training corpus that is similar to
>> what your text will look like.
>> 
>> Online services like OpenCalais are backed by gigantic databases and some
>> serious corpus-training Machine Language disambiguation algorithms.
>> 
>> So, no plug-and-play solution here. If you really need to get this done, I
>> would recommend narrowing down the specification of exactly what you will
>> settle for and looking for software that can do it. Once you have that,
>> integration with Solr is your next - and smaller - concern.
>> 
>> Regards,
>>  Alex.
>> 
>> Personal blog: http://blog.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all at
>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>> 
>> 
>> On Fri, Feb 8, 2013 at 10:41 AM, jazz <jazzsa...@me.com> wrote:
>> 
>>> Thanks Alex,
>>> 
>>> I checked the documentation but it seems there is only a webservice
>>> (OpenCalais) available to extract dates and places.
>>> 
>>> http://uima.apache.org/sandbox.html
>>> 
>>> Do you know is there is a Solr Compatible UIMA add-on which detects dates
>>> and places (cities) without a webservice? If not, how do you write one?
>>> 
>>> Regards, Bart
>>> 
>>> On 8 Feb 2013, at 15:29, Alexandre Rafalovitch wrote:
>>> 
>>>> Yes, it is possible. You are looking at UIMA or OpenNLP integration, most
>>>> probably in Update Request Processor pipeline.
>>>> 
>>>> Have a look here as a start: https://wiki.apache.org/solr/SolrUIMA
>>>> 
>>>> You will have to put some serious work into this, it is not all tied
>>>> together and packaged. Mostly because the Natural Language Processing
>>> (the
>>>> field you are getting into) is kind of messy all of its own.
>>>> 
>>>> Good luck,
>>>>  Alex.
>>>> 
>>>> Personal blog: http://blog.outerthoughts.com/
>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>>> - Time is the quality of nature that keeps events from happening all at
>>>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>>>> 
>>>> 
>>>> On Fri, Feb 8, 2013 at 9:24 AM, jazz <jazzsa...@me.com> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I want to know if Solr can analyze text and recoginze dates and places.
>>> If
>>>>> yes, is it then possible to create new dynamic fields with these dates
>>> and
>>>>> places (e.g. city).
>>>>> 
>>>>> Thanks, Bart
>>>>> 
>>> 
>>> 

Reply via email to