RE: Study Group (WAS Re: Normalized Scoring)

2005-02-07 Thread Joaquin Delgado
A very solid (and free) online course on "intelligent information retrieval" with focus on practical issues can be found on Prof. Mooney (Univ. of Texas): http://www.cs.utexas.edu/users/mooney/ir-course/ I've also copied two interesting papers (from my own private library -- if you are interested

SPANQUERY for phrase proximity search

2005-01-31 Thread Joaquin Delgado
Is there any proposal to add a proper NEAR (proximity) operator to the default query language that can handle phrase proximity, implemented as SpanNearQuery? With all the conversations about density queries and searching for "concepts" that appear in different fields, it just seems logical to trea

RE: -> Grouping Search Results by Clustering Snippets:

2005-01-28 Thread Joaquin Delgado
-Original Message- From: Joaquin Delgado Sent: Friday, January 28, 2005 4:41 PM To: 'Lucene Developers List'; [EMAIL PROTECTED] Subject: RE: -> Grouping Search Results by Clustering Snippets: This is a very interesting thread. Down is a link to a paper I published many ye

RE: Passage Search

2005-01-27 Thread Joaquin Delgado
What is described here as "Passage Search" is nothing more than a PhraseQuery with a large slope. I think it's a UI problem rather than a ranking algorithm. For example you may want to have translate simple multi-term queries into phrasequery by default (instead of AND or OR). Let's say search you

Sub-Scoring of BooleanQuery

2005-01-17 Thread Joaquin Delgado
I'm interested in obtaining the individual scores of the top-level sub-queries (query or Boolean clauses) when searching using a BooleanQuery. The main purpose is to visualize scores of individual queries representing concepts connected through a OR/AND operators representing the union or intersect

RE: [PROPOSAL] Lucene to search.apache.org

2005-01-17 Thread Joaquin Delgado
I think Doug is right. Generic names such as OS (vs. Linux), Web Server (vs. Apache WS) or Servlet Engine (vs. Tomcat) although technically correct and broad to host related applications, have no branding power. -- J.D. -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent

RE: DefaultSimilarity 2.0?

2004-12-20 Thread Joaquin Delgado
collection of TREC or other benchmark text corpus induces tailoring the algorithms to the corpus. To be fair we should run the benchmarks against multiple collections and average recall/precision. -- Joaquin Delgado -Original Message- From: Chuck Williams [mailto:[EMAIL PROTECTED] Sent

RE: About Hit Scoring

2004-10-31 Thread Joaquin Delgado
s also seams like a good start also for calculations of merged relevance ranking when exact values of multiple ranking systems/algorithms cannot be obtain or are not compatible but ranking order is available. Sorry fr this long email Just my 2 cents. Joaquin Delgado, PhD CTO, TripleHop Technolog