Two things will help in addition to what Josh suggested: a) when looking for items that are trending hot, use the difference in the log rank as a score. For most internetly things, rank is proportional to 1/rate so log rank is -log rate. Refining this slightly to -log (epsilon + 1/rank) makes things a little less jumpy.
b) use various forms of coalescence. If you are trending queries, normalize the queries by sorting terms. If you have a category handy, try that. Always invent a display name, of course. Usually, I just use the most common input that maps to a coalesced group. Item (b) may involve clustering or it may not. Depends on the data you have and the exact results you want. On Sat, Jun 18, 2011 at 7:52 PM, Mark <[email protected]> wrote: > Sorry if this isn't the right place to ask but how would I go about finding > trending data over a certain period of time. > > For example: http://www.ebay.com has a section "Trends on eBay" that is > updated daily. I was wondering how this can be accomplished using Mahout (if > possible) > > For input I have: > - user searches by day > - titles of products purchased by day > > Would this require some sort of clustering? classification? > > Thanks in advance >
