Hi Alex,

Indeed that is exactly what I am trying to achieve using wordcities. Date will 
be simple: 16-Jan becomes 16-Jan-2013 in a new dynamic field. But how do I 
integrate the Java library as UIMA? The documentation about changing schema.xml 
and solr.xml is not very detailed. 

Regards, Bart

On 8 Feb 2013, at 16:57, Alexandre Rafalovitch <arafa...@gmail.com> wrote:

> Hi Bart,
> 
> I haven't done any UIMA work (I used other stuff for my NLP phase), so not
> sure I can help much further. But in general, you are venturing into pure
> research territory here.
> 
> Even for dates, what do you actually mean? Just fixed expression? Relative
> dates (e.g. last tuesday?). What about times (7pm?).
> 
> Same with cities. If you want it offline, you need the gazetteer and
> disambiguation modules. Gazetteer for cities (worldwide) is huge and has a
> lot of duplicate names (Paris, Ontario is apparently a short drive from
> London, Ontario eh?). Something like
> http://www.maxmind.com/en/worldcities? And disambiguation usually
> requires training corpus that is similar to
> what your text will look like.
> 
> Online services like OpenCalais are backed by gigantic databases and some
> serious corpus-training Machine Language disambiguation algorithms.
> 
> So, no plug-and-play solution here. If you really need to get this done, I
> would recommend narrowing down the specification of exactly what you will
> settle for and looking for software that can do it. Once you have that,
> integration with Solr is your next - and smaller - concern.
> 
> Regards,
>   Alex.
> 
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
> 
> 
> On Fri, Feb 8, 2013 at 10:41 AM, jazz <jazzsa...@me.com> wrote:
> 
>> Thanks Alex,
>> 
>> I checked the documentation but it seems there is only a webservice
>> (OpenCalais) available to extract dates and places.
>> 
>> http://uima.apache.org/sandbox.html
>> 
>> Do you know is there is a Solr Compatible UIMA add-on which detects dates
>> and places (cities) without a webservice? If not, how do you write one?
>> 
>> Regards, Bart
>> 
>> On 8 Feb 2013, at 15:29, Alexandre Rafalovitch wrote:
>> 
>>> Yes, it is possible. You are looking at UIMA or OpenNLP integration, most
>>> probably in Update Request Processor pipeline.
>>> 
>>> Have a look here as a start: https://wiki.apache.org/solr/SolrUIMA
>>> 
>>> You will have to put some serious work into this, it is not all tied
>>> together and packaged. Mostly because the Natural Language Processing
>> (the
>>> field you are getting into) is kind of messy all of its own.
>>> 
>>> Good luck,
>>>   Alex.
>>> 
>>> Personal blog: http://blog.outerthoughts.com/
>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>> - Time is the quality of nature that keeps events from happening all at
>>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>>> 
>>> 
>>> On Fri, Feb 8, 2013 at 9:24 AM, jazz <jazzsa...@me.com> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I want to know if Solr can analyze text and recoginze dates and places.
>> If
>>>> yes, is it then possible to create new dynamic fields with these dates
>> and
>>>> places (e.g. city).
>>>> 
>>>> Thanks, Bart
>>>> 
>> 
>> 

Reply via email to