If you just want word length, then do work during indexing - index a
field for the word length. Then, I believe you can do faceting - e.g.
with the json faceting API I believe you can do a sum() calculation on a
field rather than the more traditional count.

Thinking aloud, there might be an easier way - index a field that is the
same for all documents, and facet on it. Instead of counting the number
of documents, calculate the sum() of your word count field.

I *think* that should work.

Upayavira

On Sat, Oct 24, 2015, at 04:24 PM, Aki Balogh wrote:
> Hi Jack,
> 
> I'm just using solr to get word count across a large number of documents.
> 
> It's somewhat non-standard, because we're ignoring relevance, but it
> seems
> to work well for this use case otherwise.
> 
> My understanding then is:
> 1) since termfreq is pre-processed and fetched, there's no good way to
> speed it up (except by caching earlier calculations)
> 
> 2) there's no way to have solr sum up all of the termfreqs across all
> documents in a search and just return one number for total termfreqs
> 
> 
> Are these correct?
> 
> Thanks,
> Aki
> 
> 
> On Sat, Oct 24, 2015 at 11:20 AM, Jack Krupansky
> <jack.krupan...@gmail.com>
> wrote:
> 
> > That's what a normal query does - Lucene takes all the terms used in the
> > query and sums them up for each document in the response, producing a
> > single number, the score, for each document. That's the way Solr is
> > designed to be used. You still haven't elaborated why you are trying to use
> > Solr in a way other than it was intended.
> >
> > -- Jack Krupansky
> >
> > On Sat, Oct 24, 2015 at 11:13 AM, Aki Balogh <a...@marketmuse.com> wrote:
> >
> > > Gotcha - that's disheartening.
> > >
> > > One idea: when I run termfreq, I get all of the termfreqs for each
> > document
> > > one-by-one.
> > >
> > > Is there a way to have solr sum it up before creating the request, so I
> > > only receive one number in the response?
> > >
> > >
> > > On Sat, Oct 24, 2015 at 11:05 AM, Upayavira <u...@odoko.co.uk> wrote:
> > >
> > > > If you mean using the term frequency function query, then I'm not sure
> > > > there's a huge amount you can do to improve performance.
> > > >
> > > > The term frequency is a number that is used often, so it is stored in
> > > > the index pre-calculated. Perhaps, if your data is not changing,
> > > > optimising your index would reduce it to one segment, and thus might
> > > > ever so slightly speed the aggregation of term frequencies, but I doubt
> > > > it'd make enough difference to make it worth doing.
> > > >
> > > > Upayavira
> > > >
> > > > On Sat, Oct 24, 2015, at 03:37 PM, Aki Balogh wrote:
> > > > > Thanks, Jack. I did some more research and found similar results.
> > > > >
> > > > > In our application, we are making multiple (think: 50) concurrent
> > > > > requests
> > > > > to calculate term frequency on a set of documents in "real-time". The
> > > > > faster that results return, the better.
> > > > >
> > > > > Most of these requests are unique, so cache only helps slightly.
> > > > >
> > > > > This analysis is happening on a single solr instance.
> > > > >
> > > > > Other than moving to solr cloud and splitting out the processing onto
> > > > > multiple servers, do you have any suggestions for what might speed up
> > > > > termfreq at query time?
> > > > >
> > > > > Thanks,
> > > > > Aki
> > > > >
> > > > >
> > > > > On Fri, Oct 23, 2015 at 7:21 PM, Jack Krupansky
> > > > > <jack.krupan...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Term frequency applies only to the indexed terms of a tokenized
> > > field.
> > > > > > DocValues is really just a copy of the original source text and is
> > > not
> > > > > > tokenized into terms.
> > > > > >
> > > > > > Maybe you could explain how exactly you are using term frequency in
> > > > > > function queries. More importantly, what is so "heavy" about your
> > > > usage?
> > > > > > Generally, moderate use of a feature is much more advisable to
> > heavy
> > > > usage,
> > > > > > unless you don't care about performance.
> > > > > >
> > > > > > -- Jack Krupansky
> > > > > >
> > > > > > On Fri, Oct 23, 2015 at 8:19 AM, Aki Balogh <a...@marketmuse.com>
> > > > wrote:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > In our solr application, we use a Function Query (termfreq) very
> > > > heavily.
> > > > > > >
> > > > > > > Index time and disk space are not important, but we're looking to
> > > > improve
> > > > > > > performance on termfreq at query time.
> > > > > > > I've been reading up on docValues. Would this be a way to improve
> > > > > > > performance?
> > > > > > >
> > > > > > > I had read that Lucene uses Field Cache for Function Queries, so
> > > > > > > performance may not be affected.
> > > > > > >
> > > > > > >
> > > > > > > And, any general suggestions for improving query performance on
> > > > Function
> > > > > > > Queries?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Aki
> > > > > > >
> > > > > >
> > > >
> > >
> >

Reply via email to