Paul
Its not just about merging the fields or resource usage. If you look at the scenario below, the issue is that it mixes up my fields (shipping and billing address) for instance. I can't merge them and still keep the 'distinction' for search. Your case is a 'generalization' field. Thus the search will work. I know mine is a trivial example and can be overcome by just two fields (shipping_address & billing_address - but can I am talking of cases when we have many such 'groups of fields').

In general such one to many relationship for indices in a 'document' is also really really common :). Again I am not trying to argue a point - I would be happy to get some idea on how to do it and be corrected if I'm wrong.

Lastly (while thats not my worry point right now), I tend to be careful with resources. When dealing with very large data, I will avoid any unnecessary overhead as-far-as-possible and take every optimization I get :)

Guna

On Jan 25, 2009, at 1:50 AM, Paul Libbrecht wrote:

Guna,

it's really really normal to duplicate stuffs to be merged into a field.

We do this all the time, for example to have a field "text-in-any- language" while a field "text-in-english" is also there and the queries boost matches in text-in-any-language less than text-in- english (if user is in english).

This difference in weighting is the gold of Lucene I feel (of retrieval generally). Also, depending on the field you make different indexing, while still copying it in solr (for example use a different analyzer per language).

paul

PS: don't be scared with resources, this is the side of the world where the resource is the least the problem! (typically a "catch-all- field" wouldn't be stored though as this would then load the memory).


Le 25-janv.-09 à 09:35, Gunaranjan Chandraraju a écrit :

Thanks
This sounds redundant to me - to store the fields separately and then concat all of them to one copy field again.

My XML is like this
<address street="XYZ" state="CA" country="1" type="shipping" ...>

I am currently using XPATH or XSL to separate them into individual indexed fields like: address_state_1, address_type_1 etc. in SOLR.

From what you say, it looks to me that I might as well just treat the entire address as a single 'text field' and search within the text after tokenizing. This way I don't need to have the _1, _2 as the single text field will contain the information together (and thus grouped - so I know which is shipping/billing etc?). Will there be any performance difference between this and the copy field approach?

Is there no other way (programmatic) to search across multiple fields? I did take a quick look at dismax but again it needs the field names to be specifically mentioned in the config file or in the query. I can't do this as I am not able to predict the number of fields (e.g. credit cards a person can have?).

I like SOLR, but to me, this seems to be a very common and simple search scenario/pattern - however its implementation in SOLR is appearing to be not very straightforward. (My apologies, if I on the wrong track here because I don't understand SOLR well. )

Regards,
Guna
On Jan 24, 2009, at 10:54 PM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

for searching you need to put them in a single field . use <copyField>
in schema.xml to achieve that

On Sun, Jan 25, 2009 at 7:39 AM, Gunaranjan Chandraraju
<chandrar...@apple.com> wrote:
I make this approach work with XPATH and XSL. However, this approach
creates multiple fields of like this

address_state_1
address_state_2
...
address_state_10

and

credit_card_1
credit_card_2
credit_card_3


How do I search for a credit_card. The query syntax does not seem to support wild cards in field names. For e.g. I cant seem to do this ->
credit_card*:1234 4567 7890 1234

On the search side I would not know how many credit card fields got created
for a document and so I need that to be dynamic.

-g


On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:

Oops, one more gotcha. The dynamic field support is only in 1.4 trunk.

On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
chandrar...@apple.com> wrote:


<record>
<coreInfo id="123" , .../>
<address street="XYZ1" State="CA" ...type="home" />
<address street="XYZ2" state="CA" ... type="Office"/>
<address street="XYZ3" state="CA" ....type="Other"/>
</record>

I have setup my DIH to treat these as entities as below

<dataConfig>
<dataSource type="FileDataSource" encoding="UTF-8" />
<document>
<entity name ="f" processor="FileListEntityProcessor"
       baseDir="***"
       fileName=".*xml"
       rootEntity="false"
       dataSource="null" >
  <entity
     name="record"
     processor="XPathEntityProcessor"
     stream="false"
     forEach="/record"
     url="${f.fileAbsolutePath}">
          <field column="ID" xpath="/record/@id" />

          <!-- Address  -->
           <entity
               name="record_adr"
               processor="XPathEntityProcessor"
               stream="false"
               forEach="/record/address"
               url="${f.fileAbsolutePath}">
                   <field column="address_street"
xpath="/record/address/@street" />
                   <field column="address_state"
xpath="/record/address//@state" />
                   <field column="address_type"
xpath="/record/address//@type" />
          </entity>
     </entity>
</entity>
</document>
</dataConfig>


I think the only way is to create a dynamic field for each attribute (street, state etc.). Write a transformer to copy the fields from your
data
config to appropriately named dynamic field (e.g. street_1, state_1,
etc).
To maintain this counter you will need to get/store it with
Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
Context#setSessionAttribute(name, val, Context.SCOPE_DOC).

I cant't think of an easier way.
--
Regards,
Shalin Shekhar Mangar.




--
Regards,
Shalin Shekhar Mangar.





--
--Noble Paul



Reply via email to