Thanks for the article. I am indexing each page of a document as if it were a document.
I think the answer is to configure SOLR for use of the TermVector Component: http://wiki.apache.org/solr/TermVectorComponent I have not tried it yet, but someone told me on StackExchange forum to try this one. -Melanie On Sun, Jan 22, 2012 at 8:56 PM, Erick Erickson <erickerick...@gmail.com>wrote: > Here's Hoss' XY problem writeup: > http://people.apache.org/~hossman/#xyproblem > but this doesn't appear to be that. > > There's no way out of the box that I know of to do what you want. It starts > with the fact that Solr has no clue what a page is in the first place. Or > a paragraph. Or a sentence. So you're really on your own here.... > Solr only knows about *documents*. If each document is a page, > you can do some stuff with term frequencies etc. But for a larger > document you'll be getting into some pretty low-level analysis > of the data to accomplish this. > > Sorry I can't be more help. > Erick > > On Sun, Jan 22, 2012 at 5:35 PM, solr user <mvidaat...@gmail.com> wrote: > > See comments inline below. > > > > On Sun, Jan 22, 2012 at 8:27 PM, Erick Erickson <erickerick...@gmail.com > > > > wrote: > >> > >> Faceting won't work at all. Its function is to return the count > >> of the *documents* that a value occurs in, so that's no good > >> for your use case. > >> > >> "I don't know how to issue a proper SOLR query that returns a word count > >> for > >> a paragraph of text such as the term "amplifier" for a field. For some > >> reason it only returns." > >> > >> This is really unclear. Are you asking for the word counts of a > paragraph > >> that contains "amplifier"? The number of times "amplifier" appears in > >> a paragraph? In a document? > > > > > > I'm looking for the number of times the word or term appears in a > paragraph > > that I'm indexing as the field name "contents". I'm storing and indexing > the > > field name "contents" that contains multiple occurrences of the > term/word. > > However, when I query for that term it only reports that the word/term > > appeared only once in the field name "contents". > > > >> > >> > >> And why do you want this information anyway? It might be an XY problem. > > > > > > I want to be able to search for word frequency for a page in a document > that > > has many pages. So I can report to the user that the term/word occurred > on > > page 1 "10" times. The user can click on the result and go right the the > > page where the word/term appeared most frequently. > > > > What do you mean an XY problem? > > > > > >> > >> > >> Best > >> Erick > >> > >> On Fri, Jan 20, 2012 at 1:06 PM, solr user <mvidaat...@gmail.com> > wrote: > >> > SOLR reports the term occurrence for terms over all the documents. I > am > >> > having trouble making a query that returns the term occurrence in a > >> > specific page field called, documentPageId. > >> > > >> > I don't know how to issue a proper SOLR query that returns a word > count > >> > for > >> > a paragraph of text such as the term "amplifier" for a field. For some > >> > reason it only returns. > >> > > >> > The things I've tried only return a count for 1 occurrence of the term > >> > even > >> > though I see the term in the paragraph more than just once. > >> > > >> > I've tried faceting on the field, "contents" > >> > > >> > > >> > > http://localhost:8983/solr/select?indent=on&q=*:*&wt=standard&facet=on&facet.field=documentPageId&facet.query=amplifier&facet.sort=lex&facet.missing=on&facet.method=count > >> > > >> > <lst name="facet_counts"> > >> > <lst name="facet_queries"> > >> > <int name="amplifier">21</int> > >> > </lst> > >> > <lst name="facet_fields"> > >> > <lst name="documentPageId"> > >> > <int name="49667.1">1</int> > >> > <int name="49667.10">1</int> > >> > <int name="49667.11">1</int> > >> > <int name="49667.12">1</int> > >> > <int name="49667.13">1</int> > >> > <int name="49667.14">1</int> > >> > <int name="49667.15">1</int> > >> > <int name="49667.16">1</int> > >> > <int name="49667.17">1</int> > >> > <int name="49667.18">1</int> > >> > <int name="49667.19">1</int> > >> > <int name="49667.2">1</int> > >> > <int name="49667.20">1</int> > >> > <int name="49667.21">1</int> > >> > <int name="49667.3">1</int> > >> > <int name="49667.4">1</int> > >> > <int name="49667.5">1</int> > >> > <int name="49667.6">1</int> > >> > <int name="49667.7">1</int> > >> > <int name="49667.8">1</int> > >> > <int name="49667.9">1</int> > >> > <int name="49670.1">1</int> > >> > <int name="49670.2">1</int> > >> > <int name="49670.3">1</int> > >> > <int name="49670.4">1</int> > >> > <int name="49677.1">1</int> > >> > <int name="49677.2">1</int> > >> > <int name="49677.3">1</int> > >> > <int>0</int> > >> > </lst> > >> > </lst> > >> > <lst name="facet_dates"/> > >> > <lst name="facet_ranges"/> > >> > </lst> > >> > </response> > >> > > >> > > >> > In schema.xml: > >> > <field name="contents" type="bucketFirstLetter" stored="true" > >> > indexed="true" /> > >> > <field name="documentPageId" type="string" indexed="true" > stored="true" > >> > multiValued="false"/> > >> > > >> > In solrconfig.xml: > >> > > >> > <str name="facet.field">filewrapper</str> > >> > <str name="facet.field">caseNumber</str> > >> > <str name="facet.field">pageNumber</str> > >> > <str name="facet.field">documentId</str> > >> > <str name="facet.field">contents</str> > >> > <str name="facet.query">documentId</str> > >> > <str name="facet.query">caseNumber</str> > >> > <str name="facet.query">pageNumber</str> > >> > <str name="facet.field">documentPageId</str> > >> > <str name="facet.query">contents</str> > >> > > >> > Thanks in advance, > > > > >