Hey Gunaranjan, I have the same scenario as you.
A lucene index is denormalized. It should not contain entity relationship. When I need to do something like you are doing, I group the related values in one field. Let's say we have 2 credit cards. the first has id 30459673 and taxes at 1.5%/month and the second has id 56305 and taxes at 2.5%. What I do is create a multivalued field that I index the values as "id ^ taxes". In the client side I put the logic to parse the string in a convenient way to work with the values. I expect that helps you. 2009/1/25 Gunaranjan Chandraraju <chandrar...@apple.com> > Paul > Its not just about merging the fields or resource usage. If you look at > the scenario below, the issue is that it mixes up my fields (shipping and > billing address) for instance. I can't merge them and still keep the > 'distinction' for search. Your case is a 'generalization' field. Thus > the search will work. I know mine is a trivial example and can be overcome > by just two fields (shipping_address & billing_address - but can I am > talking of cases when we have many such 'groups of fields'). > > In general such one to many relationship for indices in a 'document' is > also really really common :). Again I am not trying to argue a point - I > would be happy to get some idea on how to do it and be corrected if I'm > wrong. > > Lastly (while thats not my worry point right now), I tend to be careful > with resources. When dealing with very large data, I will avoid any > unnecessary overhead as-far-as-possible and take every optimization I get :) > > Guna > > > On Jan 25, 2009, at 1:50 AM, Paul Libbrecht wrote: > > Guna, >> >> it's really really normal to duplicate stuffs to be merged into a field. >> >> We do this all the time, for example to have a field >> "text-in-any-language" while a field "text-in-english" is also there and the >> queries boost matches in text-in-any-language less than text-in-english (if >> user is in english). >> >> This difference in weighting is the gold of Lucene I feel (of retrieval >> generally). >> Also, depending on the field you make different indexing, while still >> copying it in solr (for example use a different analyzer per language). >> >> paul >> >> PS: don't be scared with resources, this is the side of the world where >> the resource is the least the problem! (typically a "catch-all-field" >> wouldn't be stored though as this would then load the memory). >> >> >> Le 25-janv.-09 à 09:35, Gunaranjan Chandraraju a écrit : >> >> Thanks >>> This sounds redundant to me - to store the fields separately and then >>> concat all of them to one copy field again. >>> >>> My XML is like this >>> <address street="XYZ" state="CA" country="1" type="shipping" ...> >>> >>> I am currently using XPATH or XSL to separate them into individual >>> indexed fields like: address_state_1, address_type_1 etc. in SOLR. >>> >>> From what you say, it looks to me that I might as well just treat the >>> entire address as a single 'text field' and search within the text after >>> tokenizing. This way I don't need to have the _1, _2 as the single text >>> field will contain the information together (and thus grouped - so I know >>> which is shipping/billing etc?). Will there be any performance difference >>> between this and the copy field approach? >>> >>> Is there no other way (programmatic) to search across multiple fields? I >>> did take a quick look at dismax but again it needs the field names to be >>> specifically mentioned in the config file or in the query. I can't do this >>> as I am not able to predict the number of fields (e.g. credit cards a person >>> can have?). >>> >>> I like SOLR, but to me, this seems to be a very common and simple search >>> scenario/pattern - however its implementation in SOLR is appearing to be not >>> very straightforward. (My apologies, if I on the wrong track here because >>> I don't understand SOLR well. ) >>> >>> Regards, >>> Guna >>> On Jan 24, 2009, at 10:54 PM, Noble Paul നോബിള് नोब्ळ् wrote: >>> >>> for searching you need to put them in a single field . use <copyField> >>>> in schema.xml to achieve that >>>> >>>> On Sun, Jan 25, 2009 at 7:39 AM, Gunaranjan Chandraraju >>>> <chandrar...@apple.com> wrote: >>>> >>>>> I make this approach work with XPATH and XSL. However, this approach >>>>> creates multiple fields of like this >>>>> >>>>> address_state_1 >>>>> address_state_2 >>>>> ... >>>>> address_state_10 >>>>> >>>>> and >>>>> >>>>> credit_card_1 >>>>> credit_card_2 >>>>> credit_card_3 >>>>> >>>>> >>>>> How do I search for a credit_card. The query syntax does not seem to >>>>> support wild cards in field names. For e.g. I cant seem to do this -> >>>>> credit_card*:1234 4567 7890 1234 >>>>> >>>>> On the search side I would not know how many credit card fields got >>>>> created >>>>> for a document and so I need that to be dynamic. >>>>> >>>>> -g >>>>> >>>>> >>>>> On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote: >>>>> >>>>> Oops, one more gotcha. The dynamic field support is only in 1.4 trunk. >>>>>> >>>>>> On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar < >>>>>> shalinman...@gmail.com> wrote: >>>>>> >>>>>> On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju < >>>>>>> chandrar...@apple.com> wrote: >>>>>>> >>>>>>> >>>>>>>> <record> >>>>>>>> <coreInfo id="123" , .../> >>>>>>>> <address street="XYZ1" State="CA" ...type="home" /> >>>>>>>> <address street="XYZ2" state="CA" ... type="Office"/> >>>>>>>> <address street="XYZ3" state="CA" ....type="Other"/> >>>>>>>> </record> >>>>>>>> >>>>>>>> I have setup my DIH to treat these as entities as below >>>>>>>> >>>>>>>> <dataConfig> >>>>>>>> <dataSource type="FileDataSource" encoding="UTF-8" /> >>>>>>>> <document> >>>>>>>> <entity name ="f" processor="FileListEntityProcessor" >>>>>>>> baseDir="***" >>>>>>>> fileName=".*xml" >>>>>>>> rootEntity="false" >>>>>>>> dataSource="null" > >>>>>>>> <entity >>>>>>>> name="record" >>>>>>>> processor="XPathEntityProcessor" >>>>>>>> stream="false" >>>>>>>> forEach="/record" >>>>>>>> url="${f.fileAbsolutePath}"> >>>>>>>> <field column="ID" xpath="/record/@id" /> >>>>>>>> >>>>>>>> <!-- Address --> >>>>>>>> <entity >>>>>>>> name="record_adr" >>>>>>>> processor="XPathEntityProcessor" >>>>>>>> stream="false" >>>>>>>> forEach="/record/address" >>>>>>>> url="${f.fileAbsolutePath}"> >>>>>>>> <field column="address_street" >>>>>>>> xpath="/record/address/@street" /> >>>>>>>> <field column="address_state" >>>>>>>> xpath="/record/address//@state" /> >>>>>>>> <field column="address_type" >>>>>>>> xpath="/record/address//@type" /> >>>>>>>> </entity> >>>>>>>> </entity> >>>>>>>> </entity> >>>>>>>> </document> >>>>>>>> </dataConfig> >>>>>>>> >>>>>>>> >>>>>>> I think the only way is to create a dynamic field for each attribute >>>>>>> (street, state etc.). Write a transformer to copy the fields from >>>>>>> your >>>>>>> data >>>>>>> config to appropriately named dynamic field (e.g. street_1, state_1, >>>>>>> etc). >>>>>>> To maintain this counter you will need to get/store it with >>>>>>> Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and >>>>>>> Context#setSessionAttribute(name, val, Context.SCOPE_DOC). >>>>>>> >>>>>>> I cant't think of an easier way. >>>>>>> -- >>>>>>> Regards, >>>>>>> Shalin Shekhar Mangar. >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Regards, >>>>>> Shalin Shekhar Mangar. >>>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> --Noble Paul >>>> >>> >>> >> > -- Alexander Ramos Jardim