Elisabeth,

Yes, it will almost always be more efficient to search within a catch-all
field than to search across multiple fields. Think of it this way: when you
search on a single field, you are doing a single keyword search against the
index per term. When you search across multiple fields, you are executing
the search for that term multiple times (once for each field) against the
index, and then doing the necessary intersections/unions/etc. of the
document sets.

As you continue to add more and more fields to search across, the search
continues to grow slower. If you're only searching a few fields then it
will probably not be noticeably slower, but the more and more you add, the
slower your response times will become. This slowdown may be measured in
milliseconds, in which case you may not care, but it will be slower.

The idf point you mentioned can be both a pro and a con depending upon the
use case. For example, if you are searching news content that has a
"french_text" field and an "english_text" field, it would be suboptimal if
for the search "Barack Obama" you got only French documents at the top
because the US president's name is much more commonly found in English
documents. When you're searching fields with different types of content,
however, you might find examples where you'd actually want idf differences
maintained and documents differentiated based upon underlying field.

One particularly nice thing about the multi-field approach is that it is
very easy to apply different boosts to the fields and to dynamically change
the boosts. You can similarly do this with payloads within a catch-all
field. You could even assign each term a payload corresponding to which
field the content came from, and then dynamically change the boosts
associated with those payloads at query time (caveat - custom code
required). See this blog post for an end-to-end payload scoring example,
https://lucidworks.com/blog/2014/06/13/end-to-end-payload-example-in-solr/.


Sharing my personal experience: at CareerBuilder, we use the catch-all
field with payloads (one per underlying field) that we can dynamically
change the weight of at query time. We found that for most of our corpus
sizes (ranging between 2 and 100 million full text jobs or resumes), that
is is more efficient to search between 1 and 3 fields than to do the
multi-field search with payload scoring, but once we get to the 4th field
the extra cost associated with the payload scoring was overtaken by the
additional time required to search each additional field.   These numbers
(3 vs 4 fields, etc.) are all anecdotal, of course, as it is dependent upon
a lot of environmental and corpus factors unique to our use case.

The main point of this approach, however, is that there is no additional
cost per-field beyond the upfront cost to add and score payloads, so we
have been able to easily represent over a hundred of these payload-based
"virtual fields" with different weights within a catch-all field (all with
a fixed query-time cost).

*In summary*: yes, you should expect a performance decline as you add more
and more fields to your query if you are searching across multiple fields.
You can overcome this by using a single catch-all field if you are okay
losing IDF per-field (you'll still have it globally across all fields). If
you want to use a catch-all field, but still want to boost content based
upon the field it originated within, you can accomplish this with payloads.

All the best,

Trey Grainger
Co-author, Solr in Action
Director of Engineering, Search & Recommendations @ CareerBuilder


On Mon, Oct 12, 2015 at 9:12 AM, Ahmet Arslan <iori...@yahoo.com.invalid>
wrote:

> Hi,
>
> Catch-all field: No need to worry about how to aggregate scores coming
> from different fields.
> But you cannot utilize different analysers for different fields.
>
> Multiple-fields: You can play with edismax's parameters on-the-fly,
> without having to re-index.
> It is flexible that you can include/exclude fields from search.
>
> Ahmet
>
>
>
> On Monday, October 12, 2015 3:39 PM, elisabeth benoit <
> elisaelisael...@gmail.com> wrote:
> Hello,
>
> We're using solr 4.10 and storing all data in a catchall field. It seems to
> me that one good reason for using a catchall field is when using scoring
> with idf (with idf, a word might not have same score in all fields). We got
> rid of idf and are now considering using multiple fields. I remember
> reading somewhere that using a catchall field might speed up searching
> time. I was wondering if some of you have any opinion (or experience)
> related to this subject.
>
> Best regards,
> Elisabeth
>

Reply via email to