Hi all,

 

I would like to propose a new property on copy fields that limit the number
of characters that are copied.

 

The use case is the following: Among other documents, we index very big
documents (several Mo of text) and want to be able to use highlighting.
However, as soon as one or more big documents are included in the matches,
the response time is awful. The maxAnalyzedChars is not enough as the full
document is uploaded in memory before to do any processing and that alone
can be very long.

 

For this kind of situations, we propose to use a dedicated copy field for
highlighting and to limit the number of characters that are copied. For
exemple:

<copyField source="text" dest="highlight" maxSize="30000" />

 

This approach has also the advantage of limiting the index size for large
documents (the original text field does not need to be stored and to have
term vectors). However, the index is bigger for small documents...

 

Of course, if the only terms that are matched by a query are after the
limit, no highlight is possible.

 

What do you think of this feature?

 

Best regards,

Nicolas

Reply via email to