Hi Alessandro,

I'm counting word frequencies on a site. All I want to do is, I want to
count "running" and "run" as the same topic.

It's not really fuzzy matching I believe -- i.e. I wouldn't want to match
"running" and "sprinting".

I think stemming should be it.. seems to work fine now..

TY,
Aki


On Mon, Jul 27, 2015 at 5:54 AM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> A part the funny "crypted" message by Darin xD
> I would like to focus on the initial user requirement :
>
> "get term
> frequencies with fuzzy matching"
>
> Solr/Lucene offer you the support for fuzzy query independently of the way
> you token filter your terms at analysis time.
> You can run fuzzy queries with the edit distance ( by default calculated
> over a Levenstein Automaton) .
>
> This will allow you to run your fuzzy query and leave your index terms as
> you want  ( without affecting in this way the term frequency) .
>
> Can you give us more details about your use of stemming ?
> Usually stemming is something a little bit different from fuzzy search.
> But it is a good way to solve some search requirements ( always keep in
> mind that stemming degrade the precision of your system in favour to your
> recall)
>
> Cheers
>
>
> 2015-07-25 20:21 GMT+01:00 Aki Balogh <a...@marketmuse.com>:
>
> > I believe I found a solution: use a third-party stemmer to stem the term
> > first, then pass it to termfreq.
> >
> > The only trick is, each term in a phrase has to be stemmed separately
> (i.e.
> > "end-user experience" has to be broken down into "end-user" -> "end-us"
> and
> > "experience" -> "experi") before being passed, i.e. termfreq(body,
> "end-us
> > experi").
> >
> > From what I can tell, FunctionQuery / termfreq doesn't have a way to
> apply
> > stemming.
> >
> > Akos (Aki) Balogh
> > Co-Founder, MarketMuse
> > https://www.MarketMuse.com <https://www.marketmuse.com/>
> >
> >
> > On Fri, Jul 24, 2015 at 12:04 PM, Aki Balogh <a...@marketmuse.com> wrote:
> >
> > > Hi All,
> > >
> > > I'm using TermVectorComponent and stemming (Porter) in order to get
> term
> > > frequencies with fuzzy matching. I'm stemming at index and query time.
> > >
> > > Is there a way to get term frequency from the index?
> > > * termfreq doesn't support stemming or wildcards
> > > * terms component doesn't allow additional filters
> > > * I could use a copyfield to save a non-stemmed version at indexing,
> and
> > > run termfreq on that, but then I don't get any fuzzy matching
> > >
> > > Thanks,
> > > Aki
> > >
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Reply via email to