Re: enhancement for SynonymFilter

2016-11-18 Thread Bernd Fehling
Hi Mike, let me explain. First, after looking deeper inside I noticed that the Filters are used like a stack and called backwards. So the first incrementToken goes to the last filter in the chain. That one also uses incrementToken and and calls its predecessor in the chain and so on. So everythin

Re: enhancement for SynonymFilter

2016-11-18 Thread Bernd Fehling
Am 18.11.2016 um 08:58 schrieb Bernd Fehling: > Hi Mike, > > let me explain. > > First, after looking deeper inside I noticed that the Filters are used > like a stack and called backwards. So the first incrementToken goes > to the last filter in the chain. That one also uses incrementToken and

Re: Multi-field IDF

2016-11-18 Thread Ahmet Arslan
Hi Nicholas, Aha, I see that you are into field-based scoring, which is an unsolved problem. Then, you might find BlendedTermQuery and SynonymQuery relevant. Ahmet On Friday, November 18, 2016 12:22 AM, Nicolás Lichtmaier wrote: That depends on what you want. In this case I want to use a

Re: indexing analyzed and not_analyzed values in same field

2016-11-18 Thread Kumaran Ramasubramanian
​Hi All, ​ Can anyone say, is it advisable to have index with both analyzed and not_analyzed values in one field? ​Use case: i have custom fields in my product which can be configured differently ( ANALYZED and NOT_ANALYZED ) in different modules -- Kumaran R On Wed, Oct 26, 2016 at 12:0

Re: Multi-field IDF

2016-11-18 Thread Will Martin
In this work, we aim to improve the fi eld weighting for structured doc- ument retrieval. We fi rst introduce the notion of fi eld relevance as the generalization of fi eld weights, and discuss how it can be estimated using relevant documents, which eff ectively implements relevance feedback for f

Re: indexing analyzed and not_analyzed values in same field

2016-11-18 Thread Michael McCandless
You can do this, Lucene will let you, but it's typically a bad idea for search relevance because some documents will return only if you search for precisely the same whole token, others if you search for an analyzed token, giving the user a broken experience. Mike McCandless http://blog.mikemcca

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

2016-11-18 Thread Michael McCandless
I think you've summed up exactly the differences! And, yes, it would be possible to emulate hierarchical facets on top of flat facets, if the hierarchy is fixed depth like year/month/day. But if it's variable depth, it's trickier (but I think still possible). See e.g. the Committed Paths drill-d

Re: enhancement for SynonymFilter

2016-11-18 Thread Michael McCandless
Hmm I didn't realize there was that change in behavior between versions. But, in 6.3.0, can't you look for a token of type SYNONYM whose posInc=0 and then know that the previous (posInc>0) token had caused that synonym? You just need a bit of caching, until all synonyms for a given token have bee

Re: indexing analyzed and not_analyzed values in same field

2016-11-18 Thread Michael McCandless
So when a query arrives, you know the query is only allowed to match either module:1 (analyzed terms) or module:2 (not analyzed) but never both? If so, you should be fine. Though relevance will be sort of wonky, in case that matters, because you are polluting the unique term space; you would get

Exclusion List for standard tokenizer

2016-11-18 Thread lukes
Hi, Is there any exclusion list of characters which can be defined for StandardTokenizer ? In my case, i want to use StandardTokenizer(as it solves many problems of when to tokenization across languages) but i don't want to tokenize the stream on certain characters for example '@'. Is there a wa

Re: Exclusion List for standard tokenizer

2016-11-18 Thread lukes
Actually ClassicTokenizer seems to do the job. Any side effects of using ClassicTokenizer rather than StandardTokenizer ? Regards. -- View this message in context: http://lucene.472066.n3.nabble.com/Exclusion-List-for-standard-tokenizer-tp4306511p4306516.html Sent from the Lucene - Java Users