Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

David Neubert Sat, 10 Nov 2007 13:24:44 -0800

Ryan,

Thanks for your response.  I infer from your response that you can have a 
different analyzer for each field -- I guess I should have figured that out 
--but because I had not thought of that, I concluded that  I needed multiple 
indices (sorry , I am still very new to Solr/Lucene).


Does such an approach make querying difficult under the following condition: ?

The app that I am replacing (and trying to enhance) has the ability to search 
multiple books at once with sen/par and case sensitivity settings individually 
selectable per book (e.g. default search modes per book).  So with a single 
query request (just the query word(s)), you can search one book by par, with 
case, another by sen w/o case, etc. -- all settable as user defaults.  I need 
to try to figure out how to match that in Solr/Lucene -- I believe that the 
Analyzer approach you suggested requires the use of the same Analzyer at query 
time that was used during indexing.   So if I am hitting multiple fields (in 
the same search request) that invoke different Analyzers -- am I at a dead end, 
and have to result to consequetive multiple queries instead (and sort merge 
results afterwards?)  Or am I just over complicating this?

Dave

----- Original Message ----
From: Ryan McKinley <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Saturday, November 10, 2007 2:18:00 PM
Subject: Re: Redundant indexing * 4 only solution (for par/sen and case 
sensitivity)



> So now I have:
> (1) 4X in content indexing
> (2) 2X in actual SOLR/Lucene indices
> (3) I don't know how to practically due multiple indices using SOLR?
> 
> If there is a better way of attacking this problem, I would
 appreciate recommendations!!!
> 

I don't quite follow your current approach, but it sounds like you just
 
needs some copyFields to index the same content with multiple
 analyzers.

for example, say you have fields:

  <field name="content" type="string" indexed="true" stored="true"/>
  <field name="content_sentence" type="sentence" indexed="true" 
stored="false"/>
  <field name="content_paragraph" type="paragraph" indexed="true" 
stored="false"/>
  <field name="content_text" type="text" indexed="true"
 stored="false"/>

and copy fields:

   <copyField source="content" dest="content_sentence"/>
   <copyField source="content" dest="content_paragraph"/>
   <copyField source="content" dest="content_text"/>


The 4X indexing cost?  If you *need* to index the content 4 different 
ways, you don't have any way around that - do you?  But is it really a 
big deal?  How often does it need to index?  How big is the data?

I'm not quite following your need for multiple solr indicies, but in
 1.3 
it is possible.

ryan





__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

Reply via email to