Jack, Emir, Thanks for your answers. Moving ngram logic to client side would be a fast and easy way to test the solution and compare it with the phonetic one.
Best regards, Elisabeth 2016-03-11 10:52 GMT+01:00 Emir Arnautovic <emir.arnauto...@sematext.com>: > Hi Elizabeth, > In order to see if you will get better results, you can move ngram logic > outside of analysis chain - simplest solution is to move it to client. In > such setup, you should be able to use pf2 and pf3 and see if that produces > desired result. > > Regards, > Emir > > > On 10.03.2016 13:47, elisabeth benoit wrote: > >> oh yeah, now that you're saying it, yeah you're right, pf2 pf3 will boost >> proximity between words, not between ngrams. >> >> Thanks again, >> Elisabeth >> >> 2016-03-10 12:31 GMT+01:00 Alessandro Benedetti <abenede...@apache.org>: >> >> The reason pf2 and pf3 seems not a good solution to me is the fact that >>> the >>> edismax query parser calculate those grams on top of words shingles. >>> So it takes the query in input, and produces the shingle based on the >>> white >>> space separator. >>> >>> i.e. if you search : >>> "white tiger jumping" >>> and pf2 configured on field1. >>> You are going to end up searching in field1 : >>> "white tiger", "tiger jumping" . >>> This is really useful in full text search oriented to phrases and partial >>> phrases match. >>> But it has nothing to do with the analysis type associated at query time >>> at >>> this moment. >>> First it is used the query parser tokenisation to build the grams and >>> then >>> the query time analysis is applied. >>> This according to my remembering, >>> I will double check in the code and let you know. >>> >>> Cheers >>> >>> >>> On 10 March 2016 at 11:02, elisabeth benoit <elisaelisael...@gmail.com> >>> wrote: >>> >>> That's the use cas, yes. Find Amsterdam with Asmtreadm. >>>> >>>> And yes, we're only doing approximative search if we get 0 result. >>>> >>>> I don't quite get why pf2 pf3 not a good solution. >>>> >>>> We're actually testing a solution close to phonetic. Some kind of word >>>> reduction. >>>> >>>> Thanks for the suggestion (and the link), this makes me think maybe >>>> phonetic is the good solution. >>>> >>>> Thanks for your help, >>>> Elisabeth >>>> >>>> 2016-03-10 11:32 GMT+01:00 Alessandro Benedetti <abenede...@apache.org >>>> >: >>>> >>>> mmmm If I followed your use case is: >>>>> >>>>> I type Asmtreadm and I want document matching Amsterdam ( even if the >>>>> >>>> edit >>>> >>>>> distance is greater than 2) . >>>>> First of all is something I hope you do only if you get 0 results, if >>>>> >>>> not >>> >>>> the overhead can be great and you are going to lose a lot of precision >>>>> causing confusion in the customer. >>>>> >>>>> Pf2 and Pf3 is ngram of white space separated tokens, to make partial >>>>> phrase query to affect the scoring. >>>>> Not a good fit for your problem. >>>>> >>>>> More than grams, have you considered using some sort of phonetic >>>>> >>>> matching ? >>>> >>>>> Could this help : >>>>> https://cwiki.apache.org/confluence/display/solr/Phonetic+Matching >>>>> >>>>> Cheers >>>>> >>>>> On 10 March 2016 at 08:47, elisabeth benoit <elisaelisael...@gmail.com >>>>> wrote: >>>>> >>>>> I am trying to do approximative search with solr. We've tried fuzzy >>>>>> >>>>> search, >>>>> >>>>>> and spellcheck search, it's working ok but edit distance is limited >>>>>> >>>>> (to 2 >>>> >>>>> for DirectSolrSpellChecker in solr 4.10.1). With fuzzy operator, >>>>>> >>>>> we've >>> >>>> had >>>>> >>>>>> performance issues, and I don't think you can have an edit distance >>>>>> >>>>> more >>>> >>>>> than 2. >>>>>> >>>>>> What we used to do with a database was more efficient: storing >>>>>> >>>>> trigrams >>> >>>> with position, and then searching arround that position (not >>>>>> >>>>> precisely >>> >>>> at >>>> >>>>> that position, since it's approximative search) >>>>>> >>>>>> Position is to avoid for a trigram like ams (amsterdam) to get >>>>>> >>>>> answers >>> >>>> where the same trigram is for instance at the end of the word. I >>>>>> >>>>> would >>> >>>> like >>>>> >>>>>> answers with the same relative position between trigrams to score >>>>>> >>>>> higher. >>>> >>>>> Maybe using edismax'ss pf2 and pf3 is a way to do this. I don't see >>>>>> >>>>> any >>> >>>> other way. Please tell me if you do. >>>>>> >>>>>> From you're answer, I get that position is stored, but I dont >>>>>> >>>>> understand >>>> >>>>> how I can preserve relative order between trigrams, apart from using >>>>>> >>>>> pf2 >>>> >>>>> pf3. >>>>>> >>>>>> Best regards, >>>>>> Elisabeth >>>>>> >>>>>> 2016-03-10 0:02 GMT+01:00 Alessandro Benedetti < >>>>>> >>>>> abenede...@apache.org >>> >>>> : >>>>> >>>>>> if you store the positions for your tokens ( and it is by default >>>>>>> >>>>>> if >>> >>>> you >>>>> >>>>>> don't omit them), you have the relative position in the index. [1] >>>>>>> I attach a blog post of mine, describing a little bit more in >>>>>>> >>>>>> details >>> >>>> the >>>>> >>>>>> lucene internals. >>>>>>> >>>>>>> Apart from that, can you explain the problem you are trying to >>>>>>> >>>>>> solve >>> >>>> ? >>>> >>>>> The high level user experience ? >>>>>>> What kind of search/autocompletion/relevancy tuning are you trying >>>>>>> >>>>>> to >>> >>>> achieve ? >>>>>>> Maybe we can help better if we start from the problem :) >>>>>>> >>>>>>> Cheers >>>>>>> >>>>>>> [1] >>>>>>> >>>>>>> >>>>>>> >>> http://alexbenedetti.blogspot.co.uk/2015/07/exploring-solr-internals-lucene.html >>> >>>> On 9 March 2016 at 15:02, elisabeth benoit < >>>>>>> >>>>>> elisaelisael...@gmail.com> >>>> >>>>> wrote: >>>>>>> >>>>>>> Hello Alessandro, >>>>>>>> >>>>>>>> You may be right. What would you use to keep relative order >>>>>>>> >>>>>>> between, >>>> >>>>> for >>>>>> >>>>>>> instance, grams >>>>>>>> >>>>>>>> __a >>>>>>>> _am >>>>>>>> ams >>>>>>>> mst >>>>>>>> ste >>>>>>>> ter >>>>>>>> erd >>>>>>>> rda >>>>>>>> dam >>>>>>>> am_ >>>>>>>> >>>>>>>> of amsterdam? pf2 and pf3? That's all I can think about. Please >>>>>>>> >>>>>>> let >>> >>>> me >>>>> >>>>>> know >>>>>>> >>>>>>>> if you have more insights. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Elisabeth >>>>>>>> >>>>>>>> 2016-03-08 17:46 GMT+01:00 Alessandro Benedetti < >>>>>>>> >>>>>>> abenede...@apache.org >>>>> >>>>>> : >>>>>>> >>>>>>>> Elizabeth, >>>>>>>>> out of curiousity, could we know what you are trying to solve >>>>>>>>> >>>>>>>> with >>>> >>>>> that >>>>>> >>>>>>> complex way of tokenisation ? >>>>>>>>> Solr is really good in storing positions along with token, so I >>>>>>>>> >>>>>>>> am >>>> >>>>> curious >>>>>>>> >>>>>>>>> to know why your are mixing the things up. >>>>>>>>> >>>>>>>>> Cheers >>>>>>>>> >>>>>>>>> On 8 March 2016 at 10:08, elisabeth benoit < >>>>>>>>> >>>>>>>> elisaelisael...@gmail.com> >>>>>> >>>>>>> wrote: >>>>>>>>> >>>>>>>>> Thanks for your answer Emir, >>>>>>>>>> >>>>>>>>>> I'll check that out. >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Elisabeth >>>>>>>>>> >>>>>>>>>> 2016-03-08 10:24 GMT+01:00 Emir Arnautovic < >>>>>>>>>> >>>>>>>>> emir.arnauto...@sematext.com >>>>>>>> >>>>>>>>> : >>>>>>>>>> >>>>>>>>>> Hi Elisabeth, >>>>>>>>>>> I don't think there is such token filter, so you would have >>>>>>>>>>> >>>>>>>>>> to >>>> >>>>> create >>>>>>> >>>>>>>> your >>>>>>>>>> >>>>>>>>>>> own token filter that takes token and emits ngram token of >>>>>>>>>>> >>>>>>>>>> specific >>>>>> >>>>>>> length. >>>>>>>>>> >>>>>>>>>>> It should not be too hard to create such filter - you can >>>>>>>>>>> >>>>>>>>>> take >>>> >>>>> a >>>>> >>>>>> look >>>>>>> >>>>>>>> how >>>>>>>>> >>>>>>>>>> nagram filter is coded - yours should be simpler than that. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Emir >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 08.03.2016 08:52, elisabeth benoit wrote: >>>>>>>>>>> >>>>>>>>>>> Hello, >>>>>>>>>>>> >>>>>>>>>>>> I'm using solr 4.10.1. I'd like to index words with ngrams >>>>>>>>>>>> >>>>>>>>>>> of >>>> >>>>> fix >>>>>> >>>>>>> lenght >>>>>>>>> >>>>>>>>>> with a position in the end. >>>>>>>>>>>> >>>>>>>>>>>> For instance, with fix lenght 3, Amsterdam would be >>>>>>>>>>>> >>>>>>>>>>> something >>>> >>>>> like: >>>>>>> >>>>>>>> >>>>>>>>>>>> a0 (two spaces added at beginning) >>>>>>>>>>>> am1 >>>>>>>>>>>> ams2 >>>>>>>>>>>> mst3 >>>>>>>>>>>> ste4 >>>>>>>>>>>> ter5 >>>>>>>>>>>> erd6 >>>>>>>>>>>> rda7 >>>>>>>>>>>> dam8 >>>>>>>>>>>> am9 (one more space in the end) >>>>>>>>>>>> >>>>>>>>>>>> The number at the end being the position. >>>>>>>>>>>> >>>>>>>>>>>> Does anyone have a clue how to achieve this? >>>>>>>>>>>> >>>>>>>>>>>> Best regards, >>>>>>>>>>>> Elisabeth >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>> Monitoring * Alerting * Anomaly Detection * Centralized Log >>>>>>>>>>> >>>>>>>>>> Management >>>>>>>> >>>>>>>>> Solr & Elasticsearch Support * http://sematext.com/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> -------------------------- >>>>>>>>> >>>>>>>>> Benedetti Alessandro >>>>>>>>> Visiting card : http://about.me/alessandro_benedetti >>>>>>>>> >>>>>>>>> "Tyger, tyger burning bright >>>>>>>>> In the forests of the night, >>>>>>>>> What immortal hand or eye >>>>>>>>> Could frame thy fearful symmetry?" >>>>>>>>> >>>>>>>>> William Blake - Songs of Experience -1794 England >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> -------------------------- >>>>>>> >>>>>>> Benedetti Alessandro >>>>>>> Visiting card : http://about.me/alessandro_benedetti >>>>>>> >>>>>>> "Tyger, tyger burning bright >>>>>>> In the forests of the night, >>>>>>> What immortal hand or eye >>>>>>> Could frame thy fearful symmetry?" >>>>>>> >>>>>>> William Blake - Songs of Experience -1794 England >>>>>>> >>>>>>> >>>>> >>>>> -- >>>>> -------------------------- >>>>> >>>>> Benedetti Alessandro >>>>> Visiting card : http://about.me/alessandro_benedetti >>>>> >>>>> "Tyger, tyger burning bright >>>>> In the forests of the night, >>>>> What immortal hand or eye >>>>> Could frame thy fearful symmetry?" >>>>> >>>>> William Blake - Songs of Experience -1794 England >>>>> >>>>> >>> >>> -- >>> -------------------------- >>> >>> Benedetti Alessandro >>> Visiting card : http://about.me/alessandro_benedetti >>> >>> "Tyger, tyger burning bright >>> In the forests of the night, >>> What immortal hand or eye >>> Could frame thy fearful symmetry?" >>> >>> William Blake - Songs of Experience -1794 England >>> >>> > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > >