Gee, I was about to post. I figured my issue is that of computing the unique
terms per document. One approach to compute that value is running the
analyzer on the document before before calling addDocument, and count the
number of tokens.
Then I can invoke addDocument with the value of the field computed.

The only issue is that I'm here making the assumption that if I use the same
Analyzer addDocument used in addDocument then that will always equal the
number of terms indexed for that document. Is that a right assumption? Any
alternative where I don't need to make this assumption?


On Tue, Jul 5, 2011 at 1:29 AM, Markus Jelsma <markus.jel...@openindex.io>wrote:

> You can create a custom update processor. The passed AddUpdateCommand
> object
> has an accessor to the SolrInputDocument you're about to add. In the
> processAdd method you can add a new field with whatever you want.
>
> The wiki has a good example:
> http://wiki.apache.org/solr/UpdateRequestProcessor
>
>
> > Hello,
> >
> > I'm trying to add a field that counts the number of terms in a document
> to
> > my schema. So far I've been computing this value at query-time. Is there
> > how I could compute this once only and store the field?
> >
> > final SolrIndexSearcher searcher = request.getSearcher();
> >         final SolrIndexReader reader = searcher.getReader();
> >         final String content = "content";
> >
> >         final byte[] norms = reader.norms(content);
> >         final int[] docLengths;
> >         if (norms == null) {
> >             docLengths = null;
> >         } else {
> >             docLengths = new int[norms.length];
> >             int i = 0;
> >             for (byte b : norms) {
> >
> >                 float docNorm =
> > searcher.getSimilarity().decodeNormValue(b); int docLength = 0;
> >                 if (docNorm != 0) {
> >                     docLength = (int) (1 / docNorm); //reciprocal
> >                 }
> >                 docLengths[i++] = docLength;
> >             }
> > ...
> >  final NumericField docLenNormField = new
> > NumericField(TestQueryResponseWriter.DOC_LENGHT);
> >  docLenNormField.setIntValue(docLengths[id]);
> >  doc.add(docLenNormField);
>



-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Reply via email to