Re: Any way to ignore repeated terms in TF calculation?

2009-01-12 Thread Umesh Prasad
Hi Israel, I am trying to put the problem more concisely. 1. Fields where term frequency is very very relevant. E.g. Body: Example: if TF of badger in Body of doc 1 > TF of badger in Body of doc 2 doc 1 scores higher. 2. Fields where term frequency is irrevalent Page_Titl

Using analyzer while constructing Lucene queries

2009-01-12 Thread Rajesh parab
Hi, For proper results during searches, the recommendation is to use same analyzer for indexing and querying. We can achieve this by passing the same analyzer, which was used for indexing, to QueryParser to construct Lucene query and use this query while searching the index. The question is -

Using PerFieldAnalyzerWrapper with KeywordAnalyzer - MultiFieldQueryParser

2009-01-12 Thread Michael Nguyen
Hi all, I encountered the following problem with the searching the exact text. This is how I index: ... document.Add(new Field("keyword", "hello world", Field.Store.YES, Field.Index.UN_TOKENIZED)); This is how I try to search for "hello world" string[] fields = new string[] { "name", "keywo

RE: stuck with Encoded (possibly?) Database entries

2009-01-12 Thread Steven A Rowe
My guess is that '*' is a fixed prefix denoting this encoding method, and that the next two characters are likely an encoded representation of the number of (valid) characters on the line - from the examples you've given: - '8G' means 76 characters - '0m' means 28 characters - '0d' means 1

RE: stuck with Encoded (possibly?) Database entries

2009-01-12 Thread peter.aisher
Hi Steve That sounds possible: the problem is that I'm not sure what the plaintext version is - there is an online version of the dictionary so for example the entry for 'a' in garbled-text is this: *8G04)B0e00gTMqjEw2c3mU6rhoI(Ci4xSF4pG8bFPY2B26cuCtk4cgwPsJqRnPHxQjZBBY *8GXG4UA1QjjKK

RE: stuck with Encoded (possibly?) Database entries

2009-01-12 Thread Steven A Rowe
Hi Peter, On 01/12/2009 at 1:43 PM, peter.aisher wrote: > ... the contents of the FILE field is the definition. the problem > is that the contents of this field is just garbled text. is there > any obvious compression technique which might have been used to > store this? The text in the files

stuck with Encoded (possibly?) Database entries

2009-01-12 Thread peter.aisher
I am quite new to lucene, but am trying to learn quite quickly because: I am trying to convert a dictionary which has been stored in a lucene database (several in fact) into Stardict format so that I can ultimately import it into Dictionary.app in OS X The dictionary in question has a java front-e

Re: how to perfetch some fields

2009-01-12 Thread Koji Sekiguchi
There is an API for it: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/index/IndexReader.html#document(int,%20org.apache.lucene.document.FieldSelector) "Get the Document at the nth position. The FieldSelector may be used to determine what Fields to load and how t