Re: Partial token matches

2006-04-27 Thread Paul . Illingworth
Another approach maybe to use n-grams. Index each word as follows 2 gram field in nf fo or rm ma at 3 gram field inf nfo for orm rma mat 4 gram field info nfor form orm rmat and so on. To search for term "form" simply search the 4 gram field. The prefix query approach may suffer

Re: Partial token matches

2006-04-27 Thread karl wettin
27 apr 2006 kl. 10.05 skrev [EMAIL PROTECTED]: Another approach maybe to use n-grams. The spell checker in contrib could probably be used as a code base for that. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional

fuzzy sentence search

2006-04-27 Thread Fisheye
Is it possible to search sentences, more than one word at a time, or phrases with fuzzy search? I have implemented fuzzy search, if I only search one single word it works fine, but if I start searching more than one word or a sentence it does not find anything...strange, when I set the relevance

Re: fuzzy sentence search

2006-04-27 Thread karl wettin
27 apr 2006 kl. 10.16 skrev Fisheye: Is it possible to search sentences, more than one word at a time, or phrases with fuzzy search? I have implemented fuzzy search, if I only search one single word it works fine, but if I start searching more than one word or a sentence it does not fi

Re: fuzzy sentence search

2006-04-27 Thread Fisheye
ok, thanks for the link. I will have a look and see...but if this is really as slow as you describe it, I probably have to accept it like it is and let it. -- View this message in context: http://www.nabble.com/fuzzy-sentence-search-t1516604.html#a4118600 Sent from the Lucene - Java Users forum a

Lucene DB indexing and searching Question (1.9.1)

2006-04-27 Thread Audrius Peseckis
Hello, What I'm trying to do is to index database with lucene. Each row returned by SQL query is represented as document, and document contains fields (values of columns). I'm adding those fields to document by doing the following: Field fld = new Field("COLUMN_NAME", column.value()); Now when

TermFreqVector and performance, index size

2006-04-27 Thread Philippe Deslauriers (Beetext)
Hello, We are upgrading from 1.3 to 1.9. We planned to use the Highlight package for highlighting, replacing our in house highlight classes. >From what I can read, HighLight package requires the use of the TermFreqVector to be added to the index. I will get into the Highlight package later, but

Re: How to serach in sentence and dispaly the whole sentence

2006-04-27 Thread Erik Hatcher
On Apr 26, 2006, at 6:20 PM, anton feldmann wrote: Are the names of a field in a document unique or can i make a field with the name "sentence" for each sentence in an text document? The names of a field in a document are unique, but you can add multiple instances of the same field name. Y

Occurence (freq) and ordering

2006-04-27 Thread Philippe Deslauriers (Beetext)
Hi again, Upgrading from lucene 1.3 to 1.9. We need to order the result in order of occurrences (score of a doc = sum of occurrences of all Query). In lucene 1.3 we did rewrite all the Query classes (BooleanQuery, PhraseQuery, etc..) to reach our goals, but is there an easier way to do it

re: multiple indexes

2006-04-27 Thread Michael Dodson
I know it is possible to query against multiple indexes, but is it possible to create a composite query in which part of the query is against one index and part is against another (similar to querying against a default and a second field)? for example index1:query1 AND index2:query2 I tho

Re: lucene search sentence

2006-04-27 Thread Grant Ingersoll
Anton, Please don't cross post "How do I..." questions to the dev list, it doesn't get you anywhere and just annoys those most likely to help you. See below. -Grant Anton Feldmann wrote: Hi I wrote a Indexer which is indexing all the contents of a text and the sentence are seperated in an o

Re: lucene search sentence

2006-04-27 Thread Steven Rowe
Anton Feldmann wrote: 3) How do I display the sentence before and after the sentence the hit is in? You could: 1. Make your Lucene Document be a set of three sentences (before, searchable, after), which you store, but write a custom Analyzer which only returns tokens for the "searchable" cen

lucene search sentence

2006-04-27 Thread Anton Feldmann
Hi I wrote a Indexer which is indexing all the contents of a text and the sentence are seperated in an other Document. "Document document = new Document(new Field ("contents", reader )); StringTokenizer token = new StringTokenizer(contents.replaceAll(". ", "\\.x\\") , "\\.x\

Re: Lucene DB indexing and searching Question (1.9.1)

2006-04-27 Thread Chris Lu
You need to use MultiFieldQueryParser http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/MultiFieldQueryParser.html Sincerely, Chris Lu - Full-text search on Any Databases/Applications http://www.dbsight.net On 4/27/06, Audrius Peseckis <[EMAIL PROTECTED]> wrote: > Hel

Re: fuzzy sentence search

2006-04-27 Thread karl wettin
27 apr 2006 kl. 12.45 skrev Fisheye: ok, thanks for the link. I will have a look and see...but if this is really as slow as you describe it, I probably have to accept it like it is and let it. You might find this thread interesting: http://www.nabble.com/Contextual-suggestions-t1372611

Re: Lucene DB indexing and searching Question (1.9.1)

2006-04-27 Thread Chris Hostetter
: You need to use MultiFieldQueryParser : : http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/MultiFieldQueryParser.html or put the text from all of your fields into one uber catchall field and make that the default... foreach (column) { Field f = new Field(column.name

re: multiple indexes

2006-04-27 Thread Chris Hostetter
can you provide a little more clarification as to what it is you are trying to achieve (not just the way you hope to achieve it) Specificly: I can't make sense of what you would expect to get back with a query like this... index1:query1 AND index2:query2 ...traditionally, a query for "A and

Re: Occurence (freq) and ordering

2006-04-27 Thread Chris Hostetter
: Upgrading from lucene 1.3 to 1.9. : We need to order the result in order of occurrences (score of a doc = sum of : occurrences of all Query). : I am just starting to read on Similarity, weights etc. You are definitely on the right track with Similarity. What you want is a Similarity implimen

Efficiently paginating results.

2006-04-27 Thread Jean Sini
Hi, Our application presents search results in a paginated form. We were unable to find Searcher methods that would return, say, 'n' (typically, 10) hits after a start offset 'k'. So we're currently using the Hits collection returned by Searcher.search, and using its Hits.doc(i) method to get th

Re: Efficiently paginating results.

2006-04-27 Thread karl wettin
27 apr 2006 kl. 20.44 skrev Jean Sini: Our application presents search results in a paginated form. We were unable to find Searcher methods that would return, say, 'n' (typically, 10) hits after a start offset 'k'. So we're currently using the Hits collection returned by Searcher.search, and

Re: Efficiently paginating results.

2006-04-27 Thread Yonik Seeley
On 4/27/06, Jean Sini <[EMAIL PROTECTED]> wrote: > We were unable to find Searcher methods that would return, say, 'n' > (typically, 10) hits after a start offset 'k'. Yes, that's because to find results k through k+n, Lucene must first find results 0 through k+n. > So we're currently using the H

Re: TermFreqVector and performance, index size

2006-04-27 Thread Daniel Naber
On Donnerstag 27 April 2006 14:32, Philippe Deslauriers (Beetext) wrote: > What are the OFFSETS and POSITIONS used for? Do I need it for > Highlighting? No, you can provide an analyzer to Highlight.getBestFragment() and it will re-analyze your text without the need for term vectors. Regards Da

RE: Efficiently paginating results.

2006-04-27 Thread Jean Sini
Thanks. One of the trade-offs we are considering, along the lines of what you mentioned, has to do with whether or not to cache the Hits. The benefit being that we'd avoid re-running the search if requests for hits past the first page do come in, the cost being that we'd have to keep around all the

Re: Lucene search benchmark/stress test tool

2006-04-27 Thread Doug Cutting
Sunil Kumar PK wrote: I want to know is there any possibility or method to merge the weight calculation of index 1 and its search in a single RPC instead of doing the both function in separate steps. To score correctly, weights from all indexes must be created before any can be searched. This

Re: Efficiently paginating results.

2006-04-27 Thread karl wettin
27 apr 2006 kl. 23.39 skrev Jean Sini: 27 apr 2006 kl. 20.44 skrev Jean Sini: Our application presents search results in a paginated form. We were unable to find Searcher methods that would return, say, 'n' (typically, 10) hits after a start offset 'k'. So we're currently using the Hits colle

for the similarity measure

2006-04-27 Thread jason
Hi, After reading the code, I found the similarity measure in Lucene is not the same as the cosine coefficient measure commonly used. I dont know it is correct. And I wonder whether i can use the cosine coefficient measure in lucene or maybe the Dice's coefficient, Jaccard's coefficient and overla