date:20090112

Re: Any way to ignore repeated terms in TF calculation?

2009-01-12 Thread Umesh Prasad

Hi Israel, I am trying to put the problem more concisely. 1. Fields where term frequency is very very relevant. E.g. Body: Example: if TF of badger in Body of doc 1 > TF of badger in Body of doc 2 doc 1 scores higher. 2. Fields where term frequency is irrevalent Page_Titl

Using analyzer while constructing Lucene queries

2009-01-12 Thread Rajesh parab

Hi, For proper results during searches, the recommendation is to use same analyzer for indexing and querying. We can achieve this by passing the same analyzer, which was used for indexing, to QueryParser to construct Lucene query and use this query while searching the index. The question is -

Using PerFieldAnalyzerWrapper with KeywordAnalyzer - MultiFieldQueryParser

2009-01-12 Thread Michael Nguyen

Hi all, I encountered the following problem with the searching the exact text. This is how I index: ... document.Add(new Field("keyword", "hello world", Field.Store.YES, Field.Index.UN_TOKENIZED)); This is how I try to search for "hello world" string[] fields = new string[] { "name", "keywo

RE: stuck with Encoded (possibly?) Database entries

2009-01-12 Thread Steven A Rowe

My guess is that '*' is a fixed prefix denoting this encoding method, and that the next two characters are likely an encoded representation of the number of (valid) characters on the line - from the examples you've given: - '8G' means 76 characters - '0m' means 28 characters - '0d' means 1

RE: stuck with Encoded (possibly?) Database entries

2009-01-12 Thread peter.aisher

Hi Steve That sounds possible: the problem is that I'm not sure what the plaintext version is - there is an online version of the dictionary so for example the entry for 'a' in garbled-text is this: *8G04)B0e00gTMqjEw2c3mU6rhoI(Ci4xSF4pG8bFPY2B26cuCtk4cgwPsJqRnPHxQjZBBY *8GXG4UA1QjjKK

RE: stuck with Encoded (possibly?) Database entries

2009-01-12 Thread Steven A Rowe

Hi Peter, On 01/12/2009 at 1:43 PM, peter.aisher wrote: > ... the contents of the FILE field is the definition. the problem > is that the contents of this field is just garbled text. is there > any obvious compression technique which might have been used to > store this? The text in the files

stuck with Encoded (possibly?) Database entries

2009-01-12 Thread peter.aisher

I am quite new to lucene, but am trying to learn quite quickly because: I am trying to convert a dictionary which has been stored in a lucene database (several in fact) into Stardict format so that I can ultimately import it into Dictionary.app in OS X The dictionary in question has a java front-e

Re: how to perfetch some fields

2009-01-12 Thread Koji Sekiguchi

There is an API for it: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/index/IndexReader.html#document(int,%20org.apache.lucene.document.FieldSelector) "Get the Document at the nth position. The FieldSelector may be used to determine what Fields to load and how t

Re: Any way to ignore repeated terms in TF calculation?

Using analyzer while constructing Lucene queries

Using PerFieldAnalyzerWrapper with KeywordAnalyzer - MultiFieldQueryParser

RE: stuck with Encoded (possibly?) Database entries

RE: stuck with Encoded (possibly?) Database entries

RE: stuck with Encoded (possibly?) Database entries

stuck with Encoded (possibly?) Database entries

Re: how to perfetch some fields

8 matches

Site Navigation

Mail list logo

Footer information