You want to look to see how the documents are being analyzed... Look at the 
Lucy::Analysis set of perl modules (for hints as to how this is handled in c).  
You can write your own analyzer to take into special cases for you.  $0.02

Zachary Zebrowski
Forensic Database Engineer / Division Mentoring Liaison
The MITRE Corporation
(W) 202-406-6346
(C) 571-232-5643
(AR) KM4ZZE

-----Original Message-----
From: [email protected] [mailto:[email protected]] 
Sent: Tuesday, November 28, 2017 12:56 PM
To: [email protected]
Subject: [lucy-user] C library - Phrase Searches

Hi guys again :)

I have a question regarding the phrase searches and their scoring. As I see 
when we search for a phrase in quotation marks, e.g. "the united states", only 
messages that contain "the united states" are being returned. (to be more exact 
messages containing "the unite state" would have returned as well).

My question is how is such queries being handled in the library. Is it by 
looking at the consecutive term positions in documents? What is the performance 
impact for such queries?

Secondly how are they being scored? Is it still tf/idf? If so what is the 
definition of tf and of idf, for these queries?

Thanks as always,
Serkan

Reply via email to