2.3.2 -> 2.4.0 StandardTokenizer issue

2009-02-16 Thread Philip Puffinburger
We have our own Analyzer which has the following Public final TokenStream tokenStream(String fieldname, Reader reader) { TokenStream result = new StandardTokenizer(reader); result = new StandardFilter(result); result = new MyAccentFilter(result); result = new LowerCaseFilter(result)

RE: 2.3.2 -> 2.4.0 StandardTokenizer issue

2009-02-19 Thread Philip Puffinburger
Actually, WhitespaceTokenizer won't work. Too many person names and it won't do anything with punctuation. Something had to have changed in StandardTokenizer, and we need some of the 2.4 fixes/features, so we are kind of stuck. -Original Message----- From: Philip Pu

RE: 2.3.2 -> 2.4.0 StandardTokenizer issue

2009-02-20 Thread Philip Puffinburger
>some changes were made to the StandardTokenizer.jflex grammer (you can svn >diff the two URLs fairly trivially) to better deal with correctly >identifying >word characters, but from what i can tell that should have reduced the number >of splits, not increased them. > >it's hard to tell from you

RE: 2.3.2 -> 2.4.0 StandardTokenizer issue

2009-02-21 Thread Philip Puffinburger
That's something we can try. I don't know how much it performance we'd lose doing that as our custom filter has to decompose the tokens to do its operations. So instead of 0..1 conversions we'd be doing 1..2 conversions during indexing and searching. -Original Message- From: Robert

RE: 2.3.2 -> 2.4.0 StandardTokenizer issue

2009-02-21 Thread Philip Puffinburger
Thanks for the suggestion. We're going to go over all of this information/suggestions next week to see what we want to do. -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Saturday, February 21, 2009 11:52 AM To: java-user@lucene.apache.org Subject: Re: 2.3.2 -> 2.4

Strange(?) behaviour using MultiFieldQueryParser

2009-07-21 Thread Philip Puffinburger
We have code (using Lucene 2.4.1) that will build a query that looks like: fielda:"ruz an"~2 OR fieldb:"ruz an"~2 OR fieldc:"ruz an"~2 When passed to a MultiFieldQueryParser and parsed it comes back looking like: fielda:"ruz an"~2 fieldb:"ruz an"~2 fieldc:ruz It seems that whenever

Re: Single "A" parsing problem

2010-01-04 Thread Philip Puffinburger
I'm going to take a guess that you are using the StandardAnalyzer or another analyzer that removes stop words. 'a' is a stop word so is removed. On Jan 4, 2010, at 11:55 PM, sqzaman wrote: > > hi > i am using Java Lucene 2.9.1 > my problem is When i parse the folowing query > name: zaman AND

Re: Single "A" parsing problem

2010-01-04 Thread Philip Puffinburger
zaman wrote: > > > > Philip Puffinburger wrote: >> >> I'm going to take a guess that you are using the StandardAnalyzer or >> another analyzer that removes stop words. 'a' is a stop word so is >> removed. >> >> On Jan 4, 2010, at

Re: Use of PrefixQuery to create multi-word queries

2011-01-05 Thread Philip Puffinburger
We do something similar with a PrefixQuery. But the way we do it is to use a Keyword field to use the PrefixQuery against. So if we had a Book like with a title like 'The Brown Dog', we would end up with fields in the document like: Used for the normal full text searching title : the brown

Re: Use of PrefixQuery to create multi-word queries

2011-01-05 Thread Philip Puffinburger
On Jan 5, 2011, at 1:00 PM, L Duperval wrote: > Philip, > > I also have two fields, one for indexing and another for display. How does the > above affect searching? If you type "brown do" will it find the title > correctly > or do you have to type "brown dog" in order to get a match? Would "bro

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Philip Puffinburger
> Where do you get your Lucene/Solr downloads from? > > [] ASF Mirrors (linked in our release announcements or via the Lucene website) > > [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) > > [] I/we build them from source via an SVN/Git checkout. > > [] Other (someone in you