Hi, Eric, Just interested in this topic, so might want to ask further question based on Jul's topic.
I read the document of "Facet.sort=count" which seems to return the facets order by the doc hit counts. So, suppose one doc has title "value1 value2 value3", and another doc has title "value2 value 4 value 5", and use WhitespaceTokenizer (no matter designed in single field or multi-value field), do we get the facet results as: "value2" - 2 docs "value1" - 1 doc "value3" - 1 doc "value4" - 1 doc "value5" - 1 doc is it a way to get top words? does it cause high performance cost? Thanks, Yunfei On Wed, Dec 14, 2011 at 5:51 AM, Erick Erickson <erickerick...@gmail.com>wrote: > I don't quite understand what you're trying to do. MultiValued is > a bit misleading. All it means is that you can add the same > field multiple times to a document, i.e. (XML example) > <doc> > <add name="field">value1 value2 value3</add> > <add name="field">value4 value5 value6</add> > </doc> > > will succeed if "field" is multiValued and fail if not. > > This will work if "field" is NOT multiValued: > <doc> > <add name="field">value1 value2 value3 value4 value5 value6</add> > </doc> > > and, assuming WhitespaceTokenizer, the field "field" will contain > the exact same tokens. The only difference *might* be the > offsets, but don't worry about that quite yet, all it would really > affect is phrase queries. > > With that as a preface, I don't see why copyField has anything > to do with your problem, you'd get the same results faceting > on the title field, assuming identical analyzer chains. > > Faceting on a text field is iffy, it can be quite expensive. What you'd > get in the end, though, is a list of the top words in your corpus for > that field counted from the documents that satisfied the query. Which > sounds like what you're after. > > Best > Erick > > On Wed, Dec 14, 2011 at 4:59 AM, yunfei wu <yunfei...@gmail.com> wrote: > > Sounds like working by carefully choosing tokenizer, and then use > > facet.sort and facet.limit parameters to do faceting. > > > > Will see any expert's comments on this one. > > > > Yunfei > > > > > > On Wed, Dec 14, 2011 at 12:26 AM, darul <daru...@gmail.com> wrote: > > > >> Hello, > >> > >> Field for this scenario is "Title" and contains several words. > >> > >> For a specific query, I would like get the top ten words by frequency > in a > >> specific field. > >> > >> My idea was the following: > >> > >> - Title in my schema is stored/indexed in a specific field > >> - A copyField copy Title field content into a multivalued field. If my > >> multivalue field use a specific tokenizer which split words, does it > fill > >> each word in each multivalued items ? > >> - If so, using faceting on this multivalue field, I will get top ten > words, > >> correct ? > >> > >> Example: > >> > >> 1) Title : this is my title > >> 2) CopyField Title to specific multivalue field F1 > >> 3) F1 contains : {this, is, my, title} > >> > >> My english.... > >> > >> Thanks, > >> > >> Jul > >> > >> -- > >> View this message in context: > >> > http://lucene.472066.n3.nabble.com/Copy-in-multivalued-field-and-faceting-tp3584819p3584819.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> >