Try this,

http://viewer.opencalais.com/

They have an open API for that data. With your text message of :

"John Mayer Mumbai 411004 Juhu, car driver, also capable of body guard"
 
It gives back:

People: John Mayer Mumbai
Positions: body guard, car driver.

It's not perfect but it's not bad either..

Regards,
Kallin Nagelberg
-----Original Message-----
From: Michael Griffiths [mailto:mgriffi...@am-ind.com] 
Sent: Thursday, August 12, 2010 3:28 PM
To: solr-user@lucene.apache.org
Subject: RE: Require some advice

Solr is a search engine, not an entity extraction tool. 

While there are some decent open source entity extraction tools, they are 
focused on processing sentences and paragraphs. The structural differences in 
text messages means you'd need to do a fair amount of work to get decent entity 
extraction.

That said, you may want to look into simple word/phrase matching if your domain 
is sufficiently small. Use RegEx to extract ZIP, use dictionaries to extract 
city/area, skills, and names. Much simpler and cheaper. 

-----Original Message-----
From: Pavan Gupta [mailto:pavan....@gmail.com] 
Sent: Thursday, August 12, 2010 2:58 PM
To: solr-user@lucene.apache.org
Subject: Require some advice

Hi,
I am new to text search and mining and have been doing research for different 
available products. My application requires reading a SMS message
(unstructured) and finding out entities such as person name, area, zip , city 
and skills associated with the person. SMS would be in form of free text. The 
parsed data would be stored in database and used by Solr to display results.
A SMS message could in the following form:
"John Mayer Mumbai 411004 Juhu, car driver, also capable of body guard"
We need to interpret in the following manner:
first name -> John
last name -> Mayer
city-> Mumbai
zip -> 411004
area->Juhu
skills -> car driver, body guard


1. Is Solr capable enough to handle this application considering that SMS 
message would be unstructured.
2. How is Solr/Lucene as compared to other tools such as UIMA, GATE, CER 
(stanford university), Lingpipe?
3. Is Solr only text search or can be used for information extraction?
4. Is it recommended to use Solr with other products such as UIMA and GATE.

There are companies that are specialized in making meaning out of unstructured 
SMS messages. Do we have something similar in open source world? Can we extend 
Solr for the same purpose?

You reply would be appreciated.
Thanking you.
Regards,
Pavan

Reply via email to