Fields are placed in the index totally separately from each
other, so it’s no wonder that removing
the copyField results in this kind of savings.

And they have to be separate. Consider what comes out of the end of the
analysis chain. The same input could produce totally different output. 
As a trivial example, imagine two fields:

whitespacetokenizer
lowercasefilter

whitespacetokenizer
lowercasefilter
edgengramfilterfactory

and identical input "fleas”. The output of the first would be “fleas”, and the
output of the second would be something like “f”, “fl”, “fle”, “flea”, “fleas”.

Trying to share the tokens between fields would be a nightmare.

And that’s only one of many ways the output of two different analysis
chains could be different…

Best,
Erick



> On Sep 28, 2020, at 10:56 AM, Edward Turner <eddtur...@gmail.com> wrote:
> 
> Hi all,
> 
> We have recently switched to using edismax + qf fields, and no longer use
> copyfields to allow us to easily search over values in multiple fields (by
> copying multiple fields' values to the copyfield destinations, and then
> performing queries over the destination field).
> 
> By removing the copyfields, we've found that our index sizes have reduced
> by ~40% in some cases, which is great! We're just curious now as to exactly
> how this can be ...
> 
> My question is, given the following two schemas, if we index some data to
> the "description" field, will the index for schema1 be twice as large as
> the index of schema2? (I guess this relates to how, internally, Solr stores
> field + index data)
> 
> Old way -- schema1:
> =======
> <field name="description type="text_general" indexed="true"
> multiValued="false"/>
> <field name="default_field" type="text_general" indexed="true"
> multiValued="false" />
> <copyField source="description" dest="default_field />
> 
> New way -- schema2:
> =======
> <field name="description type="text_general" indexed="true"
> multiValued="false"/>
> 
> Many thanks and kind regards,
> 
> Edd

Reply via email to