Paul
Its not just about merging the fields or resource usage. If you look
at the scenario below, the issue is that it mixes up my fields
(shipping and billing address) for instance. I can't merge them and
still keep the 'distinction' for search. Your case is a
'generalization' field. Thus the search will work. I know mine is a
trivial example and can be overcome by just two fields
(shipping_address & billing_address - but can I am talking of cases
when we have many such 'groups of fields').
In general such one to many relationship for indices in a 'document'
is also really really common :). Again I am not trying to argue a
point - I would be happy to get some idea on how to do it and be
corrected if I'm wrong.
Lastly (while thats not my worry point right now), I tend to be
careful with resources. When dealing with very large data, I will
avoid any unnecessary overhead as-far-as-possible and take every
optimization I get :)
Guna
On Jan 25, 2009, at 1:50 AM, Paul Libbrecht wrote:
Guna,
it's really really normal to duplicate stuffs to be merged into a
field.
We do this all the time, for example to have a field "text-in-any-
language" while a field "text-in-english" is also there and the
queries boost matches in text-in-any-language less than text-in-
english (if user is in english).
This difference in weighting is the gold of Lucene I feel (of
retrieval generally).
Also, depending on the field you make different indexing, while
still copying it in solr (for example use a different analyzer per
language).
paul
PS: don't be scared with resources, this is the side of the world
where the resource is the least the problem! (typically a "catch-all-
field" wouldn't be stored though as this would then load the memory).
Le 25-janv.-09 à 09:35, Gunaranjan Chandraraju a écrit :
Thanks
This sounds redundant to me - to store the fields separately and
then concat all of them to one copy field again.
My XML is like this
<address street="XYZ" state="CA" country="1" type="shipping" ...>
I am currently using XPATH or XSL to separate them into individual
indexed fields like: address_state_1, address_type_1 etc. in SOLR.
From what you say, it looks to me that I might as well just treat
the entire address as a single 'text field' and search within the
text after tokenizing. This way I don't need to have the _1, _2 as
the single text field will contain the information together (and
thus grouped - so I know which is shipping/billing etc?). Will
there be any performance difference between this and the copy field
approach?
Is there no other way (programmatic) to search across multiple
fields? I did take a quick look at dismax but again it needs the
field names to be specifically mentioned in the config file or in
the query. I can't do this as I am not able to predict the number
of fields (e.g. credit cards a person can have?).
I like SOLR, but to me, this seems to be a very common and simple
search scenario/pattern - however its implementation in SOLR is
appearing to be not very straightforward. (My apologies, if I on
the wrong track here because I don't understand SOLR well. )
Regards,
Guna
On Jan 24, 2009, at 10:54 PM, Noble Paul നോബിള്
नोब्ळ् wrote:
for searching you need to put them in a single field . use
<copyField>
in schema.xml to achieve that
On Sun, Jan 25, 2009 at 7:39 AM, Gunaranjan Chandraraju
<chandrar...@apple.com> wrote:
I make this approach work with XPATH and XSL. However, this
approach
creates multiple fields of like this
address_state_1
address_state_2
...
address_state_10
and
credit_card_1
credit_card_2
credit_card_3
How do I search for a credit_card. The query syntax does not
seem to
support wild cards in field names. For e.g. I cant seem to do
this ->
credit_card*:1234 4567 7890 1234
On the search side I would not know how many credit card fields
got created
for a document and so I need that to be dynamic.
-g
On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:
Oops, one more gotcha. The dynamic field support is only in 1.4
trunk.
On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:
On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
chandrar...@apple.com> wrote:
<record>
<coreInfo id="123" , .../>
<address street="XYZ1" State="CA" ...type="home" />
<address street="XYZ2" state="CA" ... type="Office"/>
<address street="XYZ3" state="CA" ....type="Other"/>
</record>
I have setup my DIH to treat these as entities as below
<dataConfig>
<dataSource type="FileDataSource" encoding="UTF-8" />
<document>
<entity name ="f" processor="FileListEntityProcessor"
baseDir="***"
fileName=".*xml"
rootEntity="false"
dataSource="null" >
<entity
name="record"
processor="XPathEntityProcessor"
stream="false"
forEach="/record"
url="${f.fileAbsolutePath}">
<field column="ID" xpath="/record/@id" />
<!-- Address -->
<entity
name="record_adr"
processor="XPathEntityProcessor"
stream="false"
forEach="/record/address"
url="${f.fileAbsolutePath}">
<field column="address_street"
xpath="/record/address/@street" />
<field column="address_state"
xpath="/record/address//@state" />
<field column="address_type"
xpath="/record/address//@type" />
</entity>
</entity>
</entity>
</document>
</dataConfig>
I think the only way is to create a dynamic field for each
attribute
(street, state etc.). Write a transformer to copy the fields
from your
data
config to appropriately named dynamic field (e.g. street_1,
state_1,
etc).
To maintain this counter you will need to get/store it with
Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
Context#setSessionAttribute(name, val, Context.SCOPE_DOC).
I cant't think of an easier way.
--
Regards,
Shalin Shekhar Mangar.
--
Regards,
Shalin Shekhar Mangar.
--
--Noble Paul