Try this, http://viewer.opencalais.com/
They have an open API for that data. With your text message of : "John Mayer Mumbai 411004 Juhu, car driver, also capable of body guard" It gives back: People: John Mayer Mumbai Positions: body guard, car driver. It's not perfect but it's not bad either.. Regards, Kallin Nagelberg -----Original Message----- From: Michael Griffiths [mailto:mgriffi...@am-ind.com] Sent: Thursday, August 12, 2010 3:28 PM To: solr-user@lucene.apache.org Subject: RE: Require some advice Solr is a search engine, not an entity extraction tool. While there are some decent open source entity extraction tools, they are focused on processing sentences and paragraphs. The structural differences in text messages means you'd need to do a fair amount of work to get decent entity extraction. That said, you may want to look into simple word/phrase matching if your domain is sufficiently small. Use RegEx to extract ZIP, use dictionaries to extract city/area, skills, and names. Much simpler and cheaper. -----Original Message----- From: Pavan Gupta [mailto:pavan....@gmail.com] Sent: Thursday, August 12, 2010 2:58 PM To: solr-user@lucene.apache.org Subject: Require some advice Hi, I am new to text search and mining and have been doing research for different available products. My application requires reading a SMS message (unstructured) and finding out entities such as person name, area, zip , city and skills associated with the person. SMS would be in form of free text. The parsed data would be stored in database and used by Solr to display results. A SMS message could in the following form: "John Mayer Mumbai 411004 Juhu, car driver, also capable of body guard" We need to interpret in the following manner: first name -> John last name -> Mayer city-> Mumbai zip -> 411004 area->Juhu skills -> car driver, body guard 1. Is Solr capable enough to handle this application considering that SMS message would be unstructured. 2. How is Solr/Lucene as compared to other tools such as UIMA, GATE, CER (stanford university), Lingpipe? 3. Is Solr only text search or can be used for information extraction? 4. Is it recommended to use Solr with other products such as UIMA and GATE. There are companies that are specialized in making meaning out of unstructured SMS messages. Do we have something similar in open source world? Can we extend Solr for the same purpose? You reply would be appreciated. Thanking you. Regards, Pavan