[ 
https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628045#action_12628045
 ] 

Chris Harris commented on SOLR-538:
-----------------------------------

Thanks, Lars; that was fast. I think this patch is going to be handy.

I'm wondering what people thought about an alternative approach to keeping 
stored fields from being too large, which would require mucking around with 
Lucene. In particular, the idea would be to allow field definitions like this:

<field name="body" type="text" indexed="true" stored="true"
     omitNorms="false" compressed="true"
     maxFieldLength="2000" storeOnlyAnalyzedText="true"
/>

Here we've made the normal Lucene maxFieldLength (i.e. # tokens to analyze) 
configurable a field-by-field basis. And in this declaration we've also made it 
so that what is stored is a function of what is analyzed. (Here if the first 
2,000 tokens correspond to the first, say, 8,000 characters, then those 8,000 
characters are what's going to be actually stored in the stored field.) This 
seems a little more natural than lopping off the text after a fixed number of 
characters.

If I could do the above, I'm thinking I would use that single field for both 
searching and highlighting. But if you wanted a separate field for highlighting 
(and were willing to have things run slower than with the current patch), then 
you could do this:

<field name="body" type="text" indexed="true" stored="false" omitNorms="false" 
/>
<field name="highlighting" type="text" indexed="false" stored="true"
     compressed="true" maxFieldLength="2000" storeOnlyAnalyzedText="true" />
<copyField src="body" dest="highlighting" />


> CopyField maxLength property
> ----------------------------
>
>                 Key: SOLR-538
>                 URL: https://issues.apache.org/jira/browse/SOLR-538
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Nicolas Dessaigne
>            Priority: Minor
>         Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch, 
> SOLR-538.patch, SOLR-538.patch, SOLR-538.patch
>
>
> As discussed shortly on the mailing list (http://www.mail-archive.com/[EMAIL 
> PROTECTED]/msg09807.html), the objective of this task is to add a maxLength 
> property to the CopyField "command". This property simply limits the number 
> of characters that are copied.
> This is particularly useful to avoid very slow highlighting when the index 
> contains big documents.
> Example :
> <copyField source="text" dest="highlight" maxLength="30000" />
> This approach has also the advantage of limiting the index size for large 
> documents (the original text field does not need to be stored and to have 
> term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to