Re: Case sensitivity

2014-09-21 Thread Michael Sokolov
On 9/19/2014 9:07 AM, John Cecere wrote: Is there a way to set up Lucene so that both case-sensitive and case-insensitive searches can be done without having to generate two indexes? You might be interested in the discussion here: https://issues.apache.org/jira/browse/LUCENE-5620 which addres

Re: Case sensitivity

2014-09-19 Thread Sujit Pal
Hi John, Take a look at the PerFieldAnalyzerWrapper. As the name suggests, it allows you to create different analyzers per field. -sujit On Fri, Sep 19, 2014 at 6:50 AM, John Cecere wrote: > I've considered this, but there are two problems with it. First of all, it > feels like I'm still taki

Re: Case sensitivity

2014-09-19 Thread Ian Lea
PerFieldAnalyzerWrapper is the way to mix and match fields and analyzers. Personally I'd simply store the case-insensitive field with a call to toLowerCase() on the value and equivalent on the search string. You will of course use more storage, but you don't need to store the text contents for bo

Re: Case sensitivity

2014-09-19 Thread John Cecere
I've considered this, but there are two problems with it. First of all, it feels like I'm still taking up twice the storage, I'm just doing it using a single index rather than two of them. This doesn't sound like it's buying me anything. The second problem with this is simply that I haven't figu

Re: Case sensitivity

2014-09-19 Thread Paul Libbrecht
two fields? paul On 19 sept. 2014, at 15:07, John Cecere wrote: > Is there a way to set up Lucene so that both case-sensitive and > case-insensitive searches can be done without having to generate two indexes? > > -- > John Cecere > Principal Engineer - Oracle Corporation > 732-987-4317 / j

Re: Case Sensitivity

2008-09-19 Thread Andrzej Bialecki
Michael McCandless wrote: Anthony Urso wrote: On Thu, Aug 28, 2008 at 11:16 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: Yonik Seeley wrote: I wasn't originally going to add a Field.Index at all for omitNorms, but Doug suggested it. The problem with this type-safe way of doing things

Re: Case Sensitivity

2008-09-19 Thread Michael McCandless
Anthony Urso wrote: On Thu, Aug 28, 2008 at 11:16 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: Yonik Seeley wrote: I wasn't originally going to add a Field.Index at all for omitNorms, but Doug suggested it. The problem with this type-safe way of doing things is the combinatorial explos

Re: Case Sensitivity

2008-09-11 Thread Anthony Urso
On Thu, Aug 28, 2008 at 11:16 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Yonik Seeley wrote: >> >> I wasn't originally going to add a Field.Index at all for omitNorms, >> but Doug suggested it. >> The problem with this type-safe way of doing things is the >> combinatorial explosion. > >

Re: Case Sensitivity

2008-08-28 Thread Michael McCandless
Andrzej Bialecki wrote: Michael McCandless wrote: In fact I plan to add it as Field.Index.ANALYZED_NO_NORMS, in this issue: https://issues.apache.org/jira/browse/LUCENE-1366 This has consequences when searching - so if we expose it the javadoc has to be really good at explaining what'

Re: Case Sensitivity

2008-08-28 Thread Michael McCandless
Yonik Seeley wrote: On Thu, Aug 28, 2008 at 1:44 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: In fact I plan to add it as Field.Index.ANALYZED_NO_NORMS, in this issue: I wasn't originally going to add a Field.Index at all for omitNorms, but Doug suggested it. The problem with this ty

Re: Case Sensitivity

2008-08-28 Thread Andrzej Bialecki
Michael McCandless wrote: In fact I plan to add it as Field.Index.ANALYZED_NO_NORMS, in this issue: https://issues.apache.org/jira/browse/LUCENE-1366 This has consequences when searching - so if we expose it the javadoc has to be really good at explaining what's going on :) -- Best re

Re: Case Sensitivity

2008-08-28 Thread Yonik Seeley
On Thu, Aug 28, 2008 at 1:44 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > In fact I plan to add it as Field.Index.ANALYZED_NO_NORMS, in this issue: I wasn't originally going to add a Field.Index at all for omitNorms, but Doug suggested it. The problem with this type-safe way of doing thin

Re: Case Sensitivity

2008-08-28 Thread Michael McCandless
xt -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Andrzej Bialecki <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Thursday, August 28, 2008 1:39:21 PM Subject: Re: Case Sensitivity Otis Gospodnetic wrote: So in other words, it *is* possible t

Re: Case Sensitivity

2008-08-28 Thread Otis Gospodnetic
e.org > Sent: Thursday, August 28, 2008 1:39:21 PM > Subject: Re: Case Sensitivity > > Otis Gospodnetic wrote: > > So in other words, it *is* possible to have the field both tokenized and > > its > norms omitted? > > Yes. Probably this is an unintended side-ef

Re: Case Sensitivity

2008-08-28 Thread Andrzej Bialecki
Otis Gospodnetic wrote: So in other words, it *is* possible to have the field both tokenized and its norms omitted? Yes. Probably this is an unintended side-effect of adding setOmitNorms, but I think it's useful and IMHO we should keep it. -- Best regards, Andrzej Bialecki <>< ___. __

Re: Case Sensitivity

2008-08-28 Thread Otis Gospodnetic
sday, August 28, 2008 5:52:54 AM > Subject: Re: Case Sensitivity > > > 28 aug 2008 kl. 11.46 skrev Andrzej Bialecki: > > > Karl Wettin wrote: > >> 28 aug 2008 kl. 10.58 skrev Dino Korah: > >>> Document doc = new Document(); > >>> Field f = ne

Re: Case Sensitivity

2008-08-28 Thread Karl Wettin
28 aug 2008 kl. 11.46 skrev Andrzej Bialecki: Karl Wettin wrote: 28 aug 2008 kl. 10.58 skrev Dino Korah: Document doc = new Document(); Field f = new Field("body", bodyText, Field.Store.NO, Field.Index.TOKENIZED); f.setOmitNorms(true); Would that be equivalent to Document doc = new Document

Re: Case Sensitivity

2008-08-28 Thread Andrzej Bialecki
Karl Wettin wrote: 28 aug 2008 kl. 10.58 skrev Dino Korah: Document doc = new Document(); Field f = new Field("body", bodyText, Field.Store.NO, Field.Index.TOKENIZED); f.setOmitNorms(true); Would that be equivalent to Document doc = new Document(); Field f = new Field("body", bodyText, Field

Re: Case Sensitivity

2008-08-28 Thread Karl Wettin
28 aug 2008 kl. 10.58 skrev Dino Korah: Document doc = new Document(); Field f = new Field("body", bodyText, Field.Store.NO, Field.Index.TOKENIZED); f.setOmitNorms(true); Would that be equivalent to Document doc = new Document(); Field f = new Field("body", bodyText, Field.Store.NO ,Field.I

Re: Case Sensitivity

2008-08-27 Thread Michael McCandless
m: Michael McCandless <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, August 27, 2008 5:36:46 AM Subject: Re: Case Sensitivity Actually, as confusing as it is, Field.Index.NO_NORMS means Field.Index.UN_TOKENIZED plus field.setOmitNorms(true). Probably we sho

Re: Case Sensitivity

2008-08-27 Thread Otis Gospodnetic
Solr - Nutch - Original Message > From: Michael McCandless <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Wednesday, August 27, 2008 5:36:46 AM > Subject: Re: Case Sensitivity > > > Actually, as confusing as it is, Field.Index.NO_NORMS means

Re: Case Sensitivity

2008-08-27 Thread Michael McCandless
Or ... split the two notions apart so that you have Field.Index. [UN_]ANALYZED and, separately, Field.Index.[NO_]NORMS which could then be combined together in all 4 combinations (we'd have to fix the Parameter class to let you build up a new Parameter by combining existing ones...). I t

Re: Case Sensitivity

2008-08-27 Thread Daniel Naber
On Mittwoch, 27. August 2008, Michael McCandless wrote: > Probably we should rename it to Field.Index.UN_TOKENiZED_NO_NORMS? I think it's enough if the api doc explains it, no need to rename it. What's more confusing is that (UN_)TOKENIZED should actually be called (UN_)ANALYZED IMHO. Regards

RE: Case Sensitivity

2008-08-27 Thread Dino Korah
2008 10:37 To: java-user@lucene.apache.org Subject: Re: Case Sensitivity Actually, as confusing as it is, Field.Index.NO_NORMS means Field.Index.UN_TOKENIZED plus field.setOmitNorms(true). Probably we should rename it to Field.Index.UN_TOKENiZED_NO_NORMS? Mike Otis Gospodnetic wrote: >

Re: Case Sensitivity

2008-08-27 Thread Michael McCandless
ava-user@lucene.apache.org' Subject: RE: Case Sensitivity A little more case sensitivity questions. Based on the discussion on http://markmail.org/message/q7dqr4r7o6t6dgo5 and on this thread, is it right to say that a field, if either UN_TOKENIZED or NO_NORMS-ized, it doesn't get analy

Re: Case Sensitivity

2008-08-26 Thread Otis Gospodnetic
matext.com/ -- Lucene - Solr - Nutch - Original Message > From: Dino Korah <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Tuesday, August 26, 2008 7:11:42 AM > Subject: RE: Case Sensitivity > > A little more case sensitivity questions. > > Based

Re: Case Sensitivity

2008-08-26 Thread Otis Gospodnetic
> > To: java-user@lucene.apache.org > Sent: Tuesday, August 26, 2008 9:17:49 AM > Subject: RE: Case Sensitivity > > I think I should rephrase my question. > > [ Context: Using out of the box StandardAnalyzer for indexing and searching. > ] > > Is it right to say

RE: Case Sensitivity

2008-08-26 Thread Dino Korah
later set a few with setOmitNorms(true) (the index writer is plain StandardAnalyzer based)? A per field analyzer at query time ?! Many thanks, Dino -Original Message- From: Dino Korah [mailto:[EMAIL PROTECTED] Sent: 26 August 2008 12:12 To: 'java-user@lucene.apache.org' Subject

RE: Case Sensitivity

2008-08-26 Thread Dino Korah
-case) those fields before hand? Doest it mean that if I can afford, I should use norms. Many thanks, Dino -Original Message- From: Steven A Rowe [mailto:[EMAIL PROTECTED] Sent: 19 August 2008 17:43 To: java-user@lucene.apache.org Subject: RE: Case Sensitivity Hi Dino, I think

RE: Case Sensitivity

2008-08-22 Thread Dino Korah
: java-user@lucene.apache.org Subject: Re: Case Sensitivity Just to add to that, as I said before, in my case, I found more useful not to use UN_Tokenized. Instead, I used Tokenized with a custom analyzer that uses the KeywordTokenizer (entire input as only one token) with the LowerCaseFilter: This

Re: Case Sensitivity

2008-08-21 Thread Andre Rubin
- > From: Steven A Rowe [mailto:[EMAIL PROTECTED] > Sent: 19 August 2008 17:43 > To: java-user@lucene.apache.org > Subject: RE: Case Sensitivity > > Hi Dino, > > I think you'd benefit from reading some FAQ answers, like: > > "Why is it important to use the same a

RE: Case Sensitivity

2008-08-20 Thread Dino Korah
: RE: Case Sensitivity Hi Dino, I think you'd benefit from reading some FAQ answers, like: "Why is it important to use the same analyzer type during indexing and search?" <http://wiki.apache.org/lucene-java/LuceneFAQ#head-0f374b0fe1483c90fe7d6f2c4 4472d10961ba63c> Als

RE: Case Sensitivity

2008-08-19 Thread Steven A Rowe
Hi Dino, I think you'd benefit from reading some FAQ answers, like: "Why is it important to use the same analyzer type during indexing and search?" Also, have a look at the AnalysisParalysis wiki page fo

RE: Case Sensitivity

2008-08-19 Thread Dino Korah
Dino -Original Message- From: Doron Cohen [mailto:[EMAIL PROTECTED] Sent: 16 August 2008 21:01 To: java-user@lucene.apache.org Subject: Re: Case Sensitivity Hi Sergey, seems like case 4 and 5 are equivalent, both meaning case insensitive right. Otherwise please explain the difference.

Re: Case Sensitivity

2008-08-16 Thread Doron Cohen
Hi Sergey, seems like case 4 and 5 are equivalent, both meaning case insensitive right. Otherwise please explain the difference. If it is required to support both case sensitive (cases 1,2,3) and case insensitive (case 4/5) then both forms must be saved in the index - in two separate fields (as Er

Re: Case Sensitivity

2008-08-15 Thread Sergey Kabashnyuk
Hello Here's my use case content of the field Doc1 - Field - “text ” - “Field Without Norms” Doc2 - Field - “text ” - “field without norms” Doc3 - Field - “text ” - “FIELD WITHOUT NORMS” Query expected result 1. new T

Re: Case Sensitivity

2008-08-14 Thread Andre Rubin
Sergey, Based on a recent discussion I posted: http://www.nabble.com/Searching-Tokenized-x-Un_tokenized-td18882569.html , you cannot use Un_Tokenized because you can't have any analyzer run thorugh it. My suggestion, use a tokenized filed and a custom made Analyzer. Haven't figure out all the det

Re: Case Sensitivity

2008-08-14 Thread Erick Erickson
Be aware that StandardAnalyzer lowercases all the input, both at index and query times. Field.Store.YES will store the original text without any transformations, so doc.get() will return the original text. However, no matter what the Field.Store value, the *indexed* tokens (using TOKENIZED as you F

Re: Case Sensitivity

2008-08-14 Thread Doron Cohen
> > In example I want to show what I stored field as Field.Index.NO_NORMS > > As I understand it means what field contains original string > despite what analyzer I chose(StandardAnalyzer by default). > This would be achieved by UN_TOKENIZED. The NO_NORMS just guides Lucene to avoid normalizin

Re: Case Sensitivity

2008-08-14 Thread Sergey Kabashnyuk
Thanks for you reply Erick. About the only way to do this that I know of is to index the data three times, once without any case changing, once uppercased and once lowercased. You'll have to watch your analyzer, probably making up your own (easily done, see the synonym analyzer in Lucene in Ac

Re: Case Sensitivity

2008-08-14 Thread Erick Erickson
About the only way to do this that I know of is to index the data three times, once without any case changing, once uppercased and once lowercased. You'll have to watch your analyzer, probably making up your own (easily done, see the synonym analyzer in Lucene in Action). Your example doesn't tell

Re: Case Sensitivity

2008-08-14 Thread Sergey Kabashnyuk
Hello. I have the similar question. I need to implement 1. Case sensitive search. 2. Lower case search for concrete field. 3. Upper case search for concrete filed. For now I use new Field(“PROPERTIES”, content, Field.Store.NO, Field.Index

Re: Case Sensitivity

2008-08-13 Thread Erick Erickson
What analyzer are you using at *query* time? I suspect that's where your problem lies if you indeed "don't use any sophisticated analyzers", since you *are* using a sophisticated analyzer at index time. You almost invariably want to use the same analyzer at query time and analyzer time. Please sta

RE: Case Sensitivity

2008-08-13 Thread Steven A Rowe
Hi Dino, StandardAnalyzer incorporates StandardTokenizer, StandardFilter, LowerCaseFilter, and StopFilter. Any index you create using it will only provide case-insensitive matching. Steve On 08/13/2008 at 12:15 PM, Dino Korah wrote: > Also would like to highlight the version of Lucene I am >

RE: Case Sensitivity

2008-08-13 Thread Dino Korah
Also would like to highlight the version of Lucene I am using; It is 2.0.0. _ From: Dino Korah [mailto:[EMAIL PROTECTED] Sent: 13 August 2008 17:10 To: 'java-user@lucene.apache.org' Subject: Case Sensitivity Hi All, Once I index a bunch of documents with a StandardAnalyzer (and if t