Case Sensitivity

2008-08-13 Thread Dino Korah
Hi All, Once I index a bunch of documents with a StandardAnalyzer (and if the effort I need to put in to reindex the documents is not worth the effort), is there a way to search on the index without case sensitivity. I do not use any sophisticated Analyzer that makes use of LowerCaseTokenizer

Case sensitivity

2014-09-19 Thread John Cecere
Is there a way to set up Lucene so that both case-sensitive and case-insensitive searches can be done without having to generate two indexes? -- John Cecere Principal Engineer - Oracle Corporation 732-987-4317 / john.cec...@oracle.com

RE: Case Sensitivity

2008-08-13 Thread Dino Korah
Also would like to highlight the version of Lucene I am using; It is 2.0.0. _ From: Dino Korah [mailto:[EMAIL PROTECTED] Sent: 13 August 2008 17:10 To: 'java-user@lucene.apache.org' Subject: Case Sensitivity Hi All, Once I index a bunch of documents with a StandardAnalyz

RE: Case Sensitivity

2008-08-13 Thread Steven A Rowe
am > using; It is 2.0.0. > > _ > > From: Dino Korah [mailto:[EMAIL PROTECTED] > Sent: 13 August 2008 17:10 > To: 'java-user@lucene.apache.org' > Subject: Case Sensitivity > > > Hi All, > > Once I index a bunch of documents with a Stand

Re: Case Sensitivity

2008-08-13 Thread Erick Erickson
m > > using; It is 2.0.0. > > > > _ > > > > From: Dino Korah [mailto:[EMAIL PROTECTED] > > Sent: 13 August 2008 17:10 > > To: 'java-user@lucene.apache.org' > > Subject: Case Sensitivity > > > > > > Hi All, > > >

Re: Case Sensitivity

2008-08-14 Thread Sergey Kabashnyuk
in to reindex the documents is not worth the effort), is there a way to search on the index without case sensitivity. I do not use any sophisticated Analyzer that makes use of LowerCaseTokenizer. Please let me know if there is a solution to circumvent this case sensitivity problem. Many thanks

Re: Case Sensitivity

2008-08-14 Thread Erick Erickson
t;> effort >> I need to put in to reindex the documents is not worth the effort), is >> there >> a way to search on the index without case sensitivity. >> I do not use any sophisticated Analyzer that makes use of >> LowerCaseTokenizer. >> Please l

Re: Case Sensitivity

2008-08-14 Thread Sergey Kabashnyuk
But does anyone have an idea to how implement second and third type of search? Thanks Hi All, Once I index a bunch of documents with a StandardAnalyzer (and if the effort I need to put in to reindex the documents is not worth the effort), is there a way to search on the index withou

Re: Case Sensitivity

2008-08-14 Thread Doron Cohen
> > In example I want to show what I stored field as Field.Index.NO_NORMS > > As I understand it means what field contains original string > despite what analyzer I chose(StandardAnalyzer by default). > This would be achieved by UN_TOKENIZED. The NO_NORMS just guides Lucene to avoid normalizin

Re: Case Sensitivity

2008-08-14 Thread Erick Erickson
uot;PROPERTIES", >>> content, >>> Field.Store.NO, >>> Field.Index.NO_NORMS, >>> Field.TermVector.NO) >>> for original string and make case sensitive search. >>> >>&

Re: Case Sensitivity

2008-08-14 Thread Andre Rubin
t;>>> 1. Case sensitive search. >>>> 2. Lower case search for concrete field. >>>> 3. Upper case search for concrete filed. >>>> >>>> For now I use >>>> new Field("PROPERTIES", >>>> content, >

Re: Case Sensitivity

2008-08-15 Thread Sergey Kabashnyuk
search. But does anyone have an idea to how implement second and third type of search? Thanks Hi All, Once I index a bunch of documents with a StandardAnalyzer (and if the effort I need to put in to reindex the documents is not worth the effort), is there a way to search on the index without ca

Re: Case Sensitivity

2008-08-16 Thread Doron Cohen
is your use-case for needing both upper and >>>> lower case comparisons? I have a hard time coming >>>> up with a reason to do both that wouldn't be satisfied >>>> by just a caseless search. >>>> >>>> Best >>>> Erick >

RE: Case Sensitivity

2008-08-19 Thread Dino Korah
Dino -Original Message- From: Doron Cohen [mailto:[EMAIL PROTECTED] Sent: 16 August 2008 21:01 To: java-user@lucene.apache.org Subject: Re: Case Sensitivity Hi Sergey, seems like case 4 and 5 are equivalent, both meaning case insensitive right. Otherwise please explain the difference.

RE: Case Sensitivity

2008-08-19 Thread Steven A Rowe
Hi Dino, I think you'd benefit from reading some FAQ answers, like: "Why is it important to use the same analyzer type during indexing and search?" Also, have a look at the AnalysisParalysis wiki page fo

RE: Case Sensitivity

2008-08-20 Thread Dino Korah
: RE: Case Sensitivity Hi Dino, I think you'd benefit from reading some FAQ answers, like: "Why is it important to use the same analyzer type during indexing and search?" <http://wiki.apache.org/lucene-java/LuceneFAQ#head-0f374b0fe1483c90fe7d6f2c4 4472d10961ba63c> Als

Re: Case Sensitivity

2008-08-21 Thread Andre Rubin
- > From: Steven A Rowe [mailto:[EMAIL PROTECTED] > Sent: 19 August 2008 17:43 > To: java-user@lucene.apache.org > Subject: RE: Case Sensitivity > > Hi Dino, > > I think you'd benefit from reading some FAQ answers, like: > > "Why is it important to use the same a

RE: Case Sensitivity

2008-08-22 Thread Dino Korah
: java-user@lucene.apache.org Subject: Re: Case Sensitivity Just to add to that, as I said before, in my case, I found more useful not to use UN_Tokenized. Instead, I used Tokenized with a custom analyzer that uses the KeywordTokenizer (entire input as only one token) with the LowerCaseFilter: This

RE: Case Sensitivity

2008-08-26 Thread Dino Korah
A little more case sensitivity questions. Based on the discussion on http://markmail.org/message/q7dqr4r7o6t6dgo5 and on this thread, is it right to say that a field, if either UN_TOKENIZED or NO_NORMS-ized, it doesn't get analyzed while indexing? Which means we need to case-normalize (down

RE: Case Sensitivity

2008-08-26 Thread Dino Korah
later set a few with setOmitNorms(true) (the index writer is plain StandardAnalyzer based)? A per field analyzer at query time ?! Many thanks, Dino -Original Message- From: Dino Korah [mailto:[EMAIL PROTECTED] Sent: 26 August 2008 12:12 To: 'java-user@lucene.apache.org' Subject

Re: Case Sensitivity

2008-08-26 Thread Otis Gospodnetic
> > To: java-user@lucene.apache.org > Sent: Tuesday, August 26, 2008 9:17:49 AM > Subject: RE: Case Sensitivity > > I think I should rephrase my question. > > [ Context: Using out of the box StandardAnalyzer for indexing and searching. > ] > > Is it right to say

Re: Case Sensitivity

2008-08-26 Thread Otis Gospodnetic
matext.com/ -- Lucene - Solr - Nutch - Original Message > From: Dino Korah <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Tuesday, August 26, 2008 7:11:42 AM > Subject: RE: Case Sensitivity > > A little more case sensitivity questions. > > Based

Re: Case Sensitivity

2008-08-27 Thread Michael McCandless
ct: RE: Case Sensitivity I think I should rephrase my question. [ Context: Using out of the box StandardAnalyzer for indexing and searching. ] Is it right to say that a field, if either UN_TOKENIZED or NO_NORMS- ized ( field.setOmitNorms(true) ), it doesn't get analyzed while indexing?

RE: Case Sensitivity

2008-08-27 Thread Dino Korah
2008 10:37 To: java-user@lucene.apache.org Subject: Re: Case Sensitivity Actually, as confusing as it is, Field.Index.NO_NORMS means Field.Index.UN_TOKENIZED plus field.setOmitNorms(true). Probably we should rename it to Field.Index.UN_TOKENiZED_NO_NORMS? Mike Otis Gospodnetic wrote: >

Re: Case Sensitivity

2008-08-27 Thread Daniel Naber
On Mittwoch, 27. August 2008, Michael McCandless wrote: > Probably we should rename it to Field.Index.UN_TOKENiZED_NO_NORMS? I think it's enough if the api doc explains it, no need to rename it. What's more confusing is that (UN_)TOKENIZED should actually be called (UN_)ANALYZED IMHO. Regards

Re: Case Sensitivity

2008-08-27 Thread Michael McCandless
Or ... split the two notions apart so that you have Field.Index. [UN_]ANALYZED and, separately, Field.Index.[NO_]NORMS which could then be combined together in all 4 combinations (we'd have to fix the Parameter class to let you build up a new Parameter by combining existing ones...). I t

Re: Case Sensitivity

2008-08-27 Thread Otis Gospodnetic
Solr - Nutch - Original Message > From: Michael McCandless <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Wednesday, August 27, 2008 5:36:46 AM > Subject: Re: Case Sensitivity > > > Actually, as confusing as it is, Field.Index.NO_NORMS means

Re: Case Sensitivity

2008-08-27 Thread Michael McCandless
m: Michael McCandless <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, August 27, 2008 5:36:46 AM Subject: Re: Case Sensitivity Actually, as confusing as it is, Field.Index.NO_NORMS means Field.Index.UN_TOKENIZED plus field.setOmitNorms(true). Probably we sho

FW: Case Sensitivity

2008-08-28 Thread Dino Korah
MAIL PROTECTED] Sent: 27 August 2008 10:37 To: java-user@lucene.apache.org Subject: Re: Case Sensitivity Actually, as confusing as it is, Field.Index.NO_NORMS means Field.Index.UN_TOKENIZED plus field.setOmitNorms(true). Probably we should rename it to Field.Index.UN_TOKENiZED_NO_NORMS? Mike Ot

Re: Case Sensitivity

2008-08-28 Thread Karl Wettin
28 aug 2008 kl. 10.58 skrev Dino Korah: Document doc = new Document(); Field f = new Field("body", bodyText, Field.Store.NO, Field.Index.TOKENIZED); f.setOmitNorms(true); Would that be equivalent to Document doc = new Document(); Field f = new Field("body", bodyText, Field.Store.NO ,Field.I

Re: Case Sensitivity

2008-08-28 Thread Andrzej Bialecki
Karl Wettin wrote: 28 aug 2008 kl. 10.58 skrev Dino Korah: Document doc = new Document(); Field f = new Field("body", bodyText, Field.Store.NO, Field.Index.TOKENIZED); f.setOmitNorms(true); Would that be equivalent to Document doc = new Document(); Field f = new Field("body", bodyText, Field

Re: Case Sensitivity

2008-08-28 Thread Karl Wettin
28 aug 2008 kl. 11.46 skrev Andrzej Bialecki: Karl Wettin wrote: 28 aug 2008 kl. 10.58 skrev Dino Korah: Document doc = new Document(); Field f = new Field("body", bodyText, Field.Store.NO, Field.Index.TOKENIZED); f.setOmitNorms(true); Would that be equivalent to Document doc = new Document

Re: Case Sensitivity

2008-08-28 Thread Otis Gospodnetic
sday, August 28, 2008 5:52:54 AM > Subject: Re: Case Sensitivity > > > 28 aug 2008 kl. 11.46 skrev Andrzej Bialecki: > > > Karl Wettin wrote: > >> 28 aug 2008 kl. 10.58 skrev Dino Korah: > >>> Document doc = new Document(); > >>> Field f = ne

Re: Case Sensitivity

2008-08-28 Thread Andrzej Bialecki
Otis Gospodnetic wrote: So in other words, it *is* possible to have the field both tokenized and its norms omitted? Yes. Probably this is an unintended side-effect of adding setOmitNorms, but I think it's useful and IMHO we should keep it. -- Best regards, Andrzej Bialecki <>< ___. __

Re: Case Sensitivity

2008-08-28 Thread Otis Gospodnetic
e.org > Sent: Thursday, August 28, 2008 1:39:21 PM > Subject: Re: Case Sensitivity > > Otis Gospodnetic wrote: > > So in other words, it *is* possible to have the field both tokenized and > > its > norms omitted? > > Yes. Probably this is an unintended side-ef

Re: Case Sensitivity

2008-08-28 Thread Michael McCandless
xt -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Andrzej Bialecki <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Thursday, August 28, 2008 1:39:21 PM Subject: Re: Case Sensitivity Otis Gospodnetic wrote: So in other words, it *is* possible t

Re: Case Sensitivity

2008-08-28 Thread Yonik Seeley
On Thu, Aug 28, 2008 at 1:44 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > In fact I plan to add it as Field.Index.ANALYZED_NO_NORMS, in this issue: I wasn't originally going to add a Field.Index at all for omitNorms, but Doug suggested it. The problem with this type-safe way of doing thin

Re: Case Sensitivity

2008-08-28 Thread Andrzej Bialecki
Michael McCandless wrote: In fact I plan to add it as Field.Index.ANALYZED_NO_NORMS, in this issue: https://issues.apache.org/jira/browse/LUCENE-1366 This has consequences when searching - so if we expose it the javadoc has to be really good at explaining what's going on :) -- Best re

Re: Case Sensitivity

2008-08-28 Thread Michael McCandless
Yonik Seeley wrote: On Thu, Aug 28, 2008 at 1:44 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: In fact I plan to add it as Field.Index.ANALYZED_NO_NORMS, in this issue: I wasn't originally going to add a Field.Index at all for omitNorms, but Doug suggested it. The problem with this ty

Re: Case Sensitivity

2008-08-28 Thread Michael McCandless
Andrzej Bialecki wrote: Michael McCandless wrote: In fact I plan to add it as Field.Index.ANALYZED_NO_NORMS, in this issue: https://issues.apache.org/jira/browse/LUCENE-1366 This has consequences when searching - so if we expose it the javadoc has to be really good at explaining what'

Re: Case Sensitivity

2008-09-11 Thread Anthony Urso
On Thu, Aug 28, 2008 at 11:16 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Yonik Seeley wrote: >> >> I wasn't originally going to add a Field.Index at all for omitNorms, >> but Doug suggested it. >> The problem with this type-safe way of doing things is the >> combinatorial explosion. > >

Re: Case Sensitivity

2008-09-19 Thread Michael McCandless
Anthony Urso wrote: On Thu, Aug 28, 2008 at 11:16 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: Yonik Seeley wrote: I wasn't originally going to add a Field.Index at all for omitNorms, but Doug suggested it. The problem with this type-safe way of doing things is the combinatorial explos

Re: Case Sensitivity

2008-09-19 Thread Andrzej Bialecki
Michael McCandless wrote: Anthony Urso wrote: On Thu, Aug 28, 2008 at 11:16 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: Yonik Seeley wrote: I wasn't originally going to add a Field.Index at all for omitNorms, but Doug suggested it. The problem with this type-safe way of doing things

Wildcard Case Sensitivity

2011-01-20 Thread Amin Mohammed-Coleman
Hi Apologies up front if this question has been asked before. I have a document which contains a field that stores an untokenized value such as TEST_TYPE. The analyser used is StandardAnalyzer and I pass the same analyzer into the query. I perform the following query : fieldName:TEST_*, howe

Re: Case sensitivity

2014-09-19 Thread Paul Libbrecht
two fields? paul On 19 sept. 2014, at 15:07, John Cecere wrote: > Is there a way to set up Lucene so that both case-sensitive and > case-insensitive searches can be done without having to generate two indexes? > > -- > John Cecere > Principal Engineer - Oracle Corporation > 732-987-4317 / j

Re: Case sensitivity

2014-09-19 Thread John Cecere
I've considered this, but there are two problems with it. First of all, it feels like I'm still taking up twice the storage, I'm just doing it using a single index rather than two of them. This doesn't sound like it's buying me anything. The second problem with this is simply that I haven't figu

Re: Case sensitivity

2014-09-19 Thread Ian Lea
PerFieldAnalyzerWrapper is the way to mix and match fields and analyzers. Personally I'd simply store the case-insensitive field with a call to toLowerCase() on the value and equivalent on the search string. You will of course use more storage, but you don't need to store the text contents for bo

Re: Case sensitivity

2014-09-19 Thread Sujit Pal
Hi John, Take a look at the PerFieldAnalyzerWrapper. As the name suggests, it allows you to create different analyzers per field. -sujit On Fri, Sep 19, 2014 at 6:50 AM, John Cecere wrote: > I've considered this, but there are two problems with it. First of all, it > feels like I'm still taki

Re: Case sensitivity

2014-09-21 Thread Michael Sokolov
On 9/19/2014 9:07 AM, John Cecere wrote: Is there a way to set up Lucene so that both case-sensitive and case-insensitive searches can be done without having to generate two indexes? You might be interested in the discussion here: https://issues.apache.org/jira/browse/LUCENE-5620 which addres

Re: Wildcard Case Sensitivity

2011-01-20 Thread Jack Krupansky
underscore is kept at query time even though StandardAnalyzer (or WordDelimiterFilter or similar filter) would produce a text stream without any underscores. StandardAnalyzer lower cases its output, so there is no case sensitivity. But, if you happen to use upper case in a wildcard query term

QueryParser, PrefixQuery, and case sensitivity

2007-05-04 Thread Bill Au
I have an index with both fields that are case sensitive and insensitive. I am trying to use a QueryParser to accept query from end users for searching. The default behavior of QueryParser is to lowercase the prefix text to create the PrefixQuery. So wildcard search on the case sensitive fields

Re: QueryParser, PrefixQuery, and case sensitivity

2007-05-04 Thread Erick Erickson
Look at PerFieldAnalyzerWrapper. It allows you to use different analyzers on different fields during the query parsing phase. But I wouldn't go there if you don't have to. I suspect you'll spend a LOT of time tracking down errors in your use of a mixed case index. If for no other reason than your

Re: QueryParser, PrefixQuery, and case sensitivity

2007-05-06 Thread Bill Au
Erick, Thanks for the advice. I will take a look at PerFieldAnalyzerWrapper to see if I want to take this on. For my case, I have to use mexed case for a couple of fields since case really does matter for them (ie apple is not the same as Apple), and I actually don't want users to find the d