[GENERAL] word/phrase extraction & ranking

Marius Andreiana Wed, 14 Nov 2012 10:35:16 -0800

Hello,

From selected rows in a table, how can one extract and rank words/phrases based 
on how often they occur?


Here's an example: 
http://developer.yahoo.com/search/content/V1/termExtraction.html


INPUT:
CREATE TABLE phrases (
idBIGSERIAL,

phrase VARCHAR(10000));

INSERT INTO phrases (phrase) VALUES (‘Italian sculptors and painters of the 
renaissance favored the Virgin Mary for inspiration.’)
INSERT INTO phrases (phrase) VALUES (‘Andrea Bolgi was an italian sculptor’)

OUTPUT:
phrase | weight
italian sculptor  |  5
virgin mary | 2
painters | 1
renaissance | 1
inspiration | 1
Andrea Bolgi | 1

Some notes:
* phrases could contain “stop words”, e.g. “easy to answer”
* ideally, english language variations and synonyms would be automatically 
grouped.

I understand one might use postgresql’s full text search support, and maybe 
pg_trgm, but how exactly?


Thanks

[GENERAL] word/phrase extraction & ranking

Reply via email to