Wildcard query with untokenized punctuation

2007-03-09 Thread McGuigan, Colin
(Lucene 1.9.1) I have a "filename" field in Lucene that holds a value, like this: pagefile.sys If I run searches through QueryParser, and I do a search for: pagefile.sys pagefile pagefile. This all works because it goes through getFieldQuery, which tokenizes the string and generat

Re: Wildcard query with untokenized punctuation

2007-03-09 Thread Steffen Heinrich
On 9 Mar 2007 at 15:10, McGuigan, Colin wrote: > I have a "filename" field in Lucene that holds a value, like this: > pagefile.sys > Hi Colin, I'm still _very_ new to lucene, but isn't that what the un-tokenized indexing is for? Like in 1.9.1 doc.add(Field.Keyword("filename", "pagefile.sys"));

RE: Wildcard query with untokenized punctuation

2007-03-09 Thread McGuigan, Colin
-Original Message- From: Steffen Heinrich [mailto:[EMAIL PROTECTED] Sent: Fri 3/9/2007 4:31 PM To: java-user@lucene.apache.org Subject: Re: Wildcard query with untokenized punctuation On 9 Mar 2007 at 15:10, McGuigan, Colin wrote: >> I have a "filename" field in Luc

Re: Wildcard query with untokenized punctuation

2007-03-10 Thread Doron Cohen
Hi Colin, Is it possible that you are using an analyzer that breaks words on non letters? For instance SimpleAnalyzer? if so, the doc text: pagefile.sys is indexed as two words: pagefile sys At search time, the query text: pagefile.sys is also parsed-tokenized into a two words query: prof

RE: Wildcard query with untokenized punctuation

2007-03-10 Thread McGuigan, Colin
arch 10, 2007 2:08 AM To: java-user@lucene.apache.org Subject: Re: Wildcard query with untokenized punctuation Hi Colin, Is it possible that you are using an analyzer that breaks words on non letters? For instance SimpleAnalyzer? if so, the doc text: pagefile.sys is indexed as two words: pagefile

RE: Wildcard query with untokenized punctuation

2007-03-10 Thread Doron Cohen
reasoning behind not analyzing wildcard queries is also explained in the FAQ: "Are Wildcard, Prefix, and Fuzzy queries case sensitive?" Regards, Doron > > --Colin McGuigan > > -Original Message- > From: Doron Cohen [mailto:[EMAIL PROTECTED] > Sent: Saturday, Mar

RE: Wildcard query with untokenized punctuation

2007-03-12 Thread Chris Hostetter
: You're entirely correct about the analyzer (I'm using one that breaks on : non-alphanumeric characters, so all punctuation is ignored). To be : honest, I hadn't thought about altering this, but I guess I could; just : reticent that there might be unforeseen consequences. this is where the PerF

Wildcard query with untokenized punctuation (again)

2007-06-13 Thread Renaud Waldura
quot;smith,annanicole". To find them, I enter the query <>. The stock Lucene 2.0 query parser produces a PrefixQuery for the single token "smith,ann". This token doesn't exist in my index, and I don't get a match. I have found some references to this: http://www.nabble

Re: Wildcard query with untokenized punctuation (again)

2007-06-13 Thread Mark Miller
that contained both "smith,anna" and "smith,annanicole". To find them, I enter the query <>. The stock Lucene 2.0 query parser produces a PrefixQuery for the single token "smith,ann". This token doesn't exist in my index, and I don't get a match. I

Re: Wildcard query with untokenized punctuation (again)

2007-06-14 Thread Erick Erickson
,anna" and "smith,annanicole". To find them, I enter the query <>. The stock Lucene 2.0 query parser produces a PrefixQuery for the single token "smith,ann". This token doesn't exist in my index, and I don't get a match. I have found some references to this:

Re: Wildcard query with untokenized punctuation (again)

2007-06-14 Thread Mathieu Lecarme
th,anna" and > "smith,annanicole". To find them, I enter the query <>. The > stock Lucene 2.0 query parser produces a PrefixQuery for the single token > "smith,ann". This token doesn't exist in my index, and I don't get a match. > > I have

Re: Wildcard query with untokenized punctuation (again)

2007-06-14 Thread Mark Miller
stock Lucene 2.0 query parser produces a PrefixQuery for the single token "smith,ann". This token doesn't exist in my index, and I don't get a match. I have found some references to this: http://www.nabble.com/Wildcard-query-with-untokenized-punctuation-tf3378386 . html but

RE: Wildcard query with untokenized punctuation (again)

2007-06-14 Thread Renaud Waldura
r [mailto:[EMAIL PROTECTED] Sent: Thursday, June 14, 2007 6:43 AM To: java-user@lucene.apache.org Subject: Re: Wildcard query with untokenized punctuation (again) Gotto agree with Erick here...best idea is just to preprocess the query before sending it to the QueryParser. My first thought i

Re: Wildcard query with untokenized punctuation (again)

2007-06-14 Thread Mark Miller
uot;, "ann*"), not <<+smith +ann*>> as I said earlier. B. Getting hairy. Any hope? --Renaud -Original Message- From: Mark Miller [mailto:[EMAIL PROTECTED] Sent: Thursday, June 14, 2007 6:43 AM To: java-user@lucene.apache.org Subject: Re: Wildcard query with unt

RE: Wildcard query with untokenized punctuation (again)

2007-06-14 Thread Renaud Waldura
his issue: how to get QueryParser to generate MultiPhraseQueries. Got some good ideas from it, but unfortunately no complete solution. I'll keep on hacking. --Renaud -Original Message- From: Mark Miller [mailto:[EMAIL PROTECTED] Sent: Thursday, June 14, 2007 12:07 PM To: java-user@

Re: Wildcard query with untokenized punctuation (again)

2007-06-15 Thread Erick Erickson
e- From: Mark Miller [mailto:[EMAIL PROTECTED] Sent: Thursday, June 14, 2007 12:07 PM To: java-user@lucene.apache.org Subject: Re: Wildcard query with untokenized punctuation (again) All depends on what you are looking for. Ill try and give a hint as to what is going on now: When the QueryPa