Re: How to make Relationships work for Multi-valued Index Fields?

Gunaranjan Chandraraju Sun, 25 Jan 2009 02:19:32 -0800

Paul

Its not just about merging the fields or resource usage. If you lookat the scenario below, the issue is that it mixes up my fields(shipping and billing address) for instance. I can't merge them andstill keep the 'distinction' for search. Your case is a'generalization' field. Thus the search will work. I know mine is atrivial example and can be overcome by just two fields(shipping_address & billing_address - but can I am talking of caseswhen we have many such 'groups of fields').

In general such one to many relationship for indices in a 'document'is also really really common :). Again I am not trying to argue apoint - I would be happy to get some idea on how to do it and becorrected if I'm wrong.

Lastly (while thats not my worry point right now), I tend to becareful with resources. When dealing with very large data, I willavoid any unnecessary overhead as-far-as-possible and take everyoptimization I get :)


Guna

On Jan 25, 2009, at 1:50 AM, Paul Libbrecht wrote:

Guna,
it's really really normal to duplicate stuffs to be merged into afield.
We do this all the time, for example to have a field "text-in-any-language" while a field "text-in-english" is also there and thequeries boost matches in text-in-any-language less than text-in-english (if user is in english).
This difference in weighting is the gold of Lucene I feel (ofretrieval generally).Also, depending on the field you make different indexing, whilestill copying it in solr (for example use a different analyzer perlanguage).
paul
PS: don't be scared with resources, this is the side of the worldwhere the resource is the least the problem! (typically a "catch-all-field" wouldn't be stored though as this would then load the memory).
Le 25-janv.-09 à 09:35, Gunaranjan Chandraraju a écrit :
Thanks
This sounds redundant to me - to store the fields separately andthen concat all of them to one copy field again.
My XML is like this
<address street="XYZ" state="CA" country="1" type="shipping" ...>
I am currently using XPATH or XSL to separate them into individualindexed fields like: address_state_1, address_type_1 etc. in SOLR.
From what you say, it looks to me that I might as well just treatthe entire address as a single 'text field' and search within thetext after tokenizing. This way I don't need to have the _1, _2 asthe single text field will contain the information together (andthus grouped - so I know which is shipping/billing etc?). Willthere be any performance difference between this and the copy fieldapproach?
Is there no other way (programmatic) to search across multiplefields? I did take a quick look at dismax but again it needs thefield names to be specifically mentioned in the config file or inthe query. I can't do this as I am not able to predict the numberof fields (e.g. credit cards a person can have?).
I like SOLR, but to me, this seems to be a very common and simplesearch scenario/pattern - however its implementation in SOLR isappearing to be not very straightforward. (My apologies, if I onthe wrong track here because I don't understand SOLR well. )
Regards,
Guna
On Jan 24, 2009, at 10:54 PM, Noble Paul നോബിള്‍नोब्ळ् wrote:
for searching you need to put them in a single field . use<copyField>
in schema.xml to achieve that

On Sun, Jan 25, 2009 at 7:39 AM, Gunaranjan Chandraraju
<chandrar...@apple.com> wrote:
I make this approach work with XPATH and XSL. However, thisapproach
creates multiple fields of like this

address_state_1
address_state_2
...
address_state_10

and

credit_card_1
credit_card_2
credit_card_3
How do I search for a credit_card. The query syntax does notseem tosupport wild cards in field names. For e.g. I cant seem to dothis ->
credit_card*:1234 4567 7890 1234
On the search side I would not know how many credit card fieldsgot created
for a document and so I need that to be dynamic.

-g


On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:
Oops, one more gotcha. The dynamic field support is only in 1.4trunk.
On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:
On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
chandrar...@apple.com> wrote:
<record>
<coreInfo id="123" , .../>
<address street="XYZ1" State="CA" ...type="home" />
<address street="XYZ2" state="CA" ... type="Office"/>
<address street="XYZ3" state="CA" ....type="Other"/>
</record>

I have setup my DIH to treat these as entities as below

<dataConfig>
<dataSource type="FileDataSource" encoding="UTF-8" />
<document>
<entity name ="f" processor="FileListEntityProcessor"
       baseDir="***"
       fileName=".*xml"
       rootEntity="false"
       dataSource="null" >
  <entity
     name="record"
     processor="XPathEntityProcessor"
     stream="false"
     forEach="/record"
     url="${f.fileAbsolutePath}">
          <field column="ID" xpath="/record/@id" />

          
           <entity
               name="record_adr"
               processor="XPathEntityProcessor"
               stream="false"
               forEach="/record/address"
               url="${f.fileAbsolutePath}">
                   <field column="address_street"
xpath="/record/address/@street" />
                   <field column="address_state"
xpath="/record/address//@state" />
                   <field column="address_type"
xpath="/record/address//@type" />
          </entity>
     </entity>
</entity>
</document>
</dataConfig>
I think the only way is to create a dynamic field for eachattribute(street, state etc.). Write a transformer to copy the fieldsfrom your
data
config to appropriately named dynamic field (e.g. street_1,state_1,
etc).
To maintain this counter you will need to get/store it with
Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
Context#setSessionAttribute(name, val, Context.SCOPE_DOC).

I cant't think of an easier way.
--
Regards,
Shalin Shekhar Mangar.
--
Regards,
Shalin Shekhar Mangar.
--
--Noble Paul

Re: How to make Relationships work for Multi-valued Index Fields?

Reply via email to