Hi Alex, Indeed that is exactly what I am trying to achieve using wordcities. Date will be simple: 16-Jan becomes 16-Jan-2013 in a new dynamic field. But how do I integrate the Java library as UIMA? The documentation about changing schema.xml and solr.xml is not very detailed.
Regards, Bart On 8 Feb 2013, at 16:57, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > Hi Bart, > > I haven't done any UIMA work (I used other stuff for my NLP phase), so not > sure I can help much further. But in general, you are venturing into pure > research territory here. > > Even for dates, what do you actually mean? Just fixed expression? Relative > dates (e.g. last tuesday?). What about times (7pm?). > > Same with cities. If you want it offline, you need the gazetteer and > disambiguation modules. Gazetteer for cities (worldwide) is huge and has a > lot of duplicate names (Paris, Ontario is apparently a short drive from > London, Ontario eh?). Something like > http://www.maxmind.com/en/worldcities? And disambiguation usually > requires training corpus that is similar to > what your text will look like. > > Online services like OpenCalais are backed by gigantic databases and some > serious corpus-training Machine Language disambiguation algorithms. > > So, no plug-and-play solution here. If you really need to get this done, I > would recommend narrowing down the specification of exactly what you will > settle for and looking for software that can do it. Once you have that, > integration with Solr is your next - and smaller - concern. > > Regards, > Alex. > > Personal blog: http://blog.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all at > once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) > > > On Fri, Feb 8, 2013 at 10:41 AM, jazz <jazzsa...@me.com> wrote: > >> Thanks Alex, >> >> I checked the documentation but it seems there is only a webservice >> (OpenCalais) available to extract dates and places. >> >> http://uima.apache.org/sandbox.html >> >> Do you know is there is a Solr Compatible UIMA add-on which detects dates >> and places (cities) without a webservice? If not, how do you write one? >> >> Regards, Bart >> >> On 8 Feb 2013, at 15:29, Alexandre Rafalovitch wrote: >> >>> Yes, it is possible. You are looking at UIMA or OpenNLP integration, most >>> probably in Update Request Processor pipeline. >>> >>> Have a look here as a start: https://wiki.apache.org/solr/SolrUIMA >>> >>> You will have to put some serious work into this, it is not all tied >>> together and packaged. Mostly because the Natural Language Processing >> (the >>> field you are getting into) is kind of messy all of its own. >>> >>> Good luck, >>> Alex. >>> >>> Personal blog: http://blog.outerthoughts.com/ >>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch >>> - Time is the quality of nature that keeps events from happening all at >>> once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) >>> >>> >>> On Fri, Feb 8, 2013 at 9:24 AM, jazz <jazzsa...@me.com> wrote: >>> >>>> Hi, >>>> >>>> I want to know if Solr can analyze text and recoginze dates and places. >> If >>>> yes, is it then possible to create new dynamic fields with these dates >> and >>>> places (e.g. city). >>>> >>>> Thanks, Bart >>>> >> >>