Hi Alessandro, I'm counting word frequencies on a site. All I want to do is, I want to count "running" and "run" as the same topic.
It's not really fuzzy matching I believe -- i.e. I wouldn't want to match "running" and "sprinting". I think stemming should be it.. seems to work fine now.. TY, Aki On Mon, Jul 27, 2015 at 5:54 AM, Alessandro Benedetti < benedetti.ale...@gmail.com> wrote: > A part the funny "crypted" message by Darin xD > I would like to focus on the initial user requirement : > > "get term > frequencies with fuzzy matching" > > Solr/Lucene offer you the support for fuzzy query independently of the way > you token filter your terms at analysis time. > You can run fuzzy queries with the edit distance ( by default calculated > over a Levenstein Automaton) . > > This will allow you to run your fuzzy query and leave your index terms as > you want ( without affecting in this way the term frequency) . > > Can you give us more details about your use of stemming ? > Usually stemming is something a little bit different from fuzzy search. > But it is a good way to solve some search requirements ( always keep in > mind that stemming degrade the precision of your system in favour to your > recall) > > Cheers > > > 2015-07-25 20:21 GMT+01:00 Aki Balogh <a...@marketmuse.com>: > > > I believe I found a solution: use a third-party stemmer to stem the term > > first, then pass it to termfreq. > > > > The only trick is, each term in a phrase has to be stemmed separately > (i.e. > > "end-user experience" has to be broken down into "end-user" -> "end-us" > and > > "experience" -> "experi") before being passed, i.e. termfreq(body, > "end-us > > experi"). > > > > From what I can tell, FunctionQuery / termfreq doesn't have a way to > apply > > stemming. > > > > Akos (Aki) Balogh > > Co-Founder, MarketMuse > > https://www.MarketMuse.com <https://www.marketmuse.com/> > > > > > > On Fri, Jul 24, 2015 at 12:04 PM, Aki Balogh <a...@marketmuse.com> wrote: > > > > > Hi All, > > > > > > I'm using TermVectorComponent and stemming (Porter) in order to get > term > > > frequencies with fuzzy matching. I'm stemming at index and query time. > > > > > > Is there a way to get term frequency from the index? > > > * termfreq doesn't support stemming or wildcards > > > * terms component doesn't allow additional filters > > > * I could use a copyfield to save a non-stemmed version at indexing, > and > > > run termfreq on that, but then I don't get any fuzzy matching > > > > > > Thanks, > > > Aki > > > > > > > > > -- > -------------------------- > > Benedetti Alessandro > Visiting card - http://about.me/alessandro_benedetti > Blog - http://alexbenedetti.blogspot.co.uk > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England >