Thanks for the article.

I am indexing each page of a document as if it were a document.

I think the answer is to configure SOLR for use of the TermVector Component:
 http://wiki.apache.org/solr/TermVectorComponent

I have not tried it yet, but someone told me on StackExchange forum to try
this one.

-Melanie

On Sun, Jan 22, 2012 at 8:56 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> Here's Hoss' XY problem writeup:
> http://people.apache.org/~hossman/#xyproblem
> but this doesn't appear to be that.
>
> There's no way out of the box that I know of to do what you want. It starts
> with the fact that Solr has no clue what a page is in the first place. Or
> a paragraph. Or a sentence. So you're really on your own here....
> Solr only knows about *documents*. If each document is a page,
> you can do some stuff with term frequencies etc. But for a larger
> document you'll be getting into some pretty low-level analysis
> of the data to accomplish this.
>
> Sorry I can't be more help.
> Erick
>
> On Sun, Jan 22, 2012 at 5:35 PM, solr user <mvidaat...@gmail.com> wrote:
> > See comments inline below.
> >
> > On Sun, Jan 22, 2012 at 8:27 PM, Erick Erickson <erickerick...@gmail.com
> >
> > wrote:
> >>
> >> Faceting won't work at all. Its function is to return the count
> >> of the *documents* that a value occurs in, so that's no good
> >> for your use case.
> >>
> >> "I don't know how to issue a proper SOLR query that returns a word count
> >> for
> >> a paragraph of text such as the term "amplifier" for a field. For some
> >> reason it only returns."
> >>
> >> This is really unclear. Are you asking for the word counts of a
> paragraph
> >> that contains "amplifier"? The number of times "amplifier" appears in
> >> a paragraph? In a document?
> >
> >
> > I'm looking for the number of times the word or term appears in a
> paragraph
> > that I'm indexing as the field name "contents". I'm storing and indexing
> the
> > field name "contents" that contains multiple occurrences of the
> term/word.
> > However, when I query for that term it only reports that the word/term
> > appeared only once in the field name "contents".
> >
> >>
> >>
> >> And why do you want this information anyway? It might be an XY problem.
> >
> >
> > I want to be able to search for word frequency for a page in a document
> that
> > has many pages. So I can report to the user that the term/word occurred
> on
> > page 1 "10" times. The user can click on the result and go right the the
> > page where the word/term appeared most frequently.
> >
> > What do you mean an XY problem?
> >
> >
> >>
> >>
> >> Best
> >> Erick
> >>
> >> On Fri, Jan 20, 2012 at 1:06 PM, solr user <mvidaat...@gmail.com>
> wrote:
> >> > SOLR reports the term occurrence for terms over all the documents. I
> am
> >> > having trouble making a query that returns the term occurrence in a
> >> > specific page field called, documentPageId.
> >> >
> >> > I don't know how to issue a proper SOLR query that returns a word
> count
> >> > for
> >> > a paragraph of text such as the term "amplifier" for a field. For some
> >> > reason it only returns.
> >> >
> >> > The things I've tried only return a count for 1 occurrence of the term
> >> > even
> >> > though I see the term in the paragraph more than just once.
> >> >
> >> > I've tried faceting on the field, "contents"
> >> >
> >> >
> >> >
> http://localhost:8983/solr/select?indent=on&q=*:*&wt=standard&facet=on&facet.field=documentPageId&facet.query=amplifier&facet.sort=lex&facet.missing=on&facet.method=count
> >> >
> >> > <lst name="facet_counts">
> >> > <lst name="facet_queries">
> >> > <int name="amplifier">21</int>
> >> > </lst>
> >> > <lst name="facet_fields">
> >> > <lst name="documentPageId">
> >> > <int name="49667.1">1</int>
> >> > <int name="49667.10">1</int>
> >> > <int name="49667.11">1</int>
> >> > <int name="49667.12">1</int>
> >> > <int name="49667.13">1</int>
> >> > <int name="49667.14">1</int>
> >> > <int name="49667.15">1</int>
> >> > <int name="49667.16">1</int>
> >> > <int name="49667.17">1</int>
> >> > <int name="49667.18">1</int>
> >> > <int name="49667.19">1</int>
> >> > <int name="49667.2">1</int>
> >> > <int name="49667.20">1</int>
> >> > <int name="49667.21">1</int>
> >> > <int name="49667.3">1</int>
> >> > <int name="49667.4">1</int>
> >> > <int name="49667.5">1</int>
> >> > <int name="49667.6">1</int>
> >> > <int name="49667.7">1</int>
> >> > <int name="49667.8">1</int>
> >> > <int name="49667.9">1</int>
> >> > <int name="49670.1">1</int>
> >> > <int name="49670.2">1</int>
> >> > <int name="49670.3">1</int>
> >> > <int name="49670.4">1</int>
> >> > <int name="49677.1">1</int>
> >> > <int name="49677.2">1</int>
> >> > <int name="49677.3">1</int>
> >> > <int>0</int>
> >> > </lst>
> >> > </lst>
> >> > <lst name="facet_dates"/>
> >> > <lst name="facet_ranges"/>
> >> > </lst>
> >> > </response>
> >> >
> >> >
> >> > In schema.xml:
> >> >  <field name="contents" type="bucketFirstLetter" stored="true"
> >> > indexed="true" />
> >> >  <field name="documentPageId" type="string" indexed="true"
> stored="true"
> >> > multiValued="false"/>
> >> >
> >> > In solrconfig.xml:
> >> >
> >> >       <str name="facet.field">filewrapper</str>
> >> >       <str name="facet.field">caseNumber</str>
> >> >       <str name="facet.field">pageNumber</str>
> >> >       <str name="facet.field">documentId</str>
> >> >       <str name="facet.field">contents</str>
> >> >       <str name="facet.query">documentId</str>
> >> >       <str name="facet.query">caseNumber</str>
> >> >       <str name="facet.query">pageNumber</str>
> >> >      <str name="facet.field">documentPageId</str>
> >> >       <str name="facet.query">contents</str>
> >> >
> >> > Thanks in advance,
> >
> >
>

Reply via email to