RE: Enumerating NumericField using TermEnum?

2009-09-12 Thread Uwe Schindler
Hi Phil, thanks for checking out NumericField. I have two comments about your problem: > I've used NumericField to store my "hour" field. > > Example... > > doc.add(new > NumericField("hour").setIntValue(Integer.parseInt("12"))); NumericField uses a spezial encoding of terms for fast Nume

Filter before tokenize ?

2009-09-12 Thread Paul Taylor
Is it possible to filter before tokenize, or is that not a good idea. I want to convert '&' to 'and' , so they are dealt with the same way, but the StandardTokenizer I am using removes the &, I could change the tokenizer but because I'm not too clear on jflex syntax it would seem easier to jus

field with single quote being split

2009-09-12 Thread Ian Vink
My index has a field with the source of the document. In luke I can see that religion has baha'i or islam or Tao etc The problem is that when I construct a query in luke with "religion:baha'i" luke thinks it's 2 terms "baha" and "i" Is there a way to construct a query to make it search with

Re: Filter before tokenize ?

2009-09-12 Thread AHMET ARSLAN
--- On Sat, 9/12/09, Paul Taylor wrote: > From: Paul Taylor > Subject: Filter before tokenize ? > To: java-user@lucene.apache.org > Date: Saturday, September 12, 2009, 9:39 PM > Is it possible to filter before > tokenize, or is that not a good idea. > I want to convert '&' to 'and' , so they are

Re: field with single quote being split

2009-09-12 Thread AHMET ARSLAN
> The problem is that when I construct a query in luke with > "religion:baha'i" > luke thinks it's 2 terms "baha" and "i" Which analyzer is used in query parsing? LetterTokenizer? > Is there a way to construct a query to make it search > with the > single term "baha'i" ? Using different analyze

Re: field with single quote being split

2009-09-12 Thread Ian Vink
I'm using Snowball as I have a dozen languages. ian On Sat, Sep 12, 2009 at 4:56 PM, AHMET ARSLAN wrote: > > The problem is that when I construct a query in luke with > > "religion:baha'i" > > luke thinks it's 2 terms "baha" and "i" > > Which analyzer is used in query parsing? LetterTokenizer

Re: Filter before tokenize ?

2009-09-12 Thread Paul Taylor
AHMET ARSLAN wrote: --- On Sat, 9/12/09, Paul Taylor wrote: From: Paul Taylor Subject: Filter before tokenize ? To: java-user@lucene.apache.org Date: Saturday, September 12, 2009, 9:39 PM Is it possible to filter before tokenize, or is that not a good idea. I want to convert '&' to 'and' ,

Re: field with single quote being split

2009-09-12 Thread AHMET ARSLAN
> I'm using Snowball as I have a dozen languages. You are using SnowballAnalyzer at both index and query time, right? SnowballAnalyzer uses StandardTokenizer which keeps baha'i as one token. The apostrophe in your query, can it be \u2019 ? Something similar to ' but different character.

Re: applying cosine similarity directly

2009-09-12 Thread Anthony Urso
There is a MoreLikeThis similarity search class in Lucene, it should do what you're looking for. http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/similar/MoreLikeThis.html Cheers, Anthony On Fri, Sep 11, 2009 at 11:25 PM, Alexy Khrabrov wrote: > Given that I have a field for whi

Re: Filter before tokenize ?

2009-09-12 Thread Koji Sekiguchi
Hi Paul, CharFilter should work for this case. How about this? public class MappingAnd { static final String[] DOCS = { "R&B", "H&M", "Hennes & Mauritz", "cheeseburger and french fries" }; static final String F = "f"; static Directory dir = new RAMDirectory(); static Analyzer analyzer =