Re: Copy in multivalued field and faceting

yunfei wu Wed, 14 Dec 2011 12:25:58 -0800

Hi, Eric,

Just interested in this topic, so might want to ask further question based
on Jul's topic.


I read the document of "Facet.sort=count" which seems to return the facets
order by the doc hit counts.

So, suppose one doc has title "value1 value2 value3", and another doc has
title "value2 value 4 value 5", and use WhitespaceTokenizer (no matter
designed in single field or multi-value field), do we get the facet results
as:
"value2" - 2 docs
"value1" - 1 doc
"value3" - 1 doc
"value4" - 1 doc
"value5" - 1 doc

is it a way to get top words? does it cause high performance cost?

Thanks,
Yunfei



On Wed, Dec 14, 2011 at 5:51 AM, Erick Erickson <erickerick...@gmail.com>wrote:

> I don't quite understand what you're trying to do. MultiValued is
> a bit misleading. All it means is that you can add the same
> field multiple times to a document, i.e. (XML example)
> <doc>
>  <add name="field">value1 value2 value3</add>
>  <add name="field">value4 value5 value6</add>
> </doc>
>
> will succeed if "field" is multiValued and fail if not.
>
> This will work if "field" is NOT multiValued:
> <doc>
>  <add name="field">value1 value2 value3 value4 value5 value6</add>
> </doc>
>
> and, assuming WhitespaceTokenizer, the field "field" will contain
> the exact same tokens. The only difference *might* be the
> offsets, but don't worry about that quite yet, all it would really
> affect is phrase queries.
>
> With that as a preface, I don't see why copyField has anything
> to do with your problem, you'd get the same results faceting
> on the title field, assuming identical analyzer chains.
>
> Faceting on a text field is iffy, it can be quite expensive. What you'd
> get in the end, though, is a list of the top words in your corpus for
> that field counted from the documents that satisfied the query. Which
> sounds like what you're after.
>
> Best
> Erick
>
> On Wed, Dec 14, 2011 at 4:59 AM, yunfei wu <yunfei...@gmail.com> wrote:
> > Sounds like working by carefully choosing tokenizer, and then use
> > facet.sort and facet.limit parameters to do faceting.
> >
> > Will see any expert's comments on this one.
> >
> > Yunfei
> >
> >
> > On Wed, Dec 14, 2011 at 12:26 AM, darul <daru...@gmail.com> wrote:
> >
> >> Hello,
> >>
> >> Field for this scenario is "Title" and contains several words.
> >>
> >> For a specific query, I would like get the top ten words by frequency
> in a
> >> specific field.
> >>
> >> My idea was the following:
> >>
> >> - Title in my schema is stored/indexed in a specific field
> >> - A copyField copy Title field content into a multivalued field. If my
> >> multivalue field use a specific tokenizer which split words, does it
> fill
> >> each word in each multivalued items ?
> >> - If so, using faceting on this multivalue field, I will get top ten
> words,
> >> correct ?
> >>
> >> Example:
> >>
> >> 1) Title : this is my title
> >> 2) CopyField Title to specific multivalue field F1
> >> 3) F1 contains : {this, is, my, title}
> >>
> >> My english....
> >>
> >> Thanks,
> >>
> >> Jul
> >>
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/Copy-in-multivalued-field-and-faceting-tp3584819p3584819.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
>

Re: Copy in multivalued field and faceting

Reply via email to