d
> > then recommending items in that category all based on user behavior. Or
> try
> > a placement based on a single thing a user watched like “because you
> > watched xyz you might like these”. Don’t just show the most popular
> > categories for the user and recommend items
; was better to have specialized pages for what's new and hot rather than
> because I had data saying it was bad to do. I have put a very weak
> recommendation effect on the what's hot pages so that people tend to see
> trending material that they like. That doesn't help on what's new pa
nt to hear White Christmas
> > <https://www.youtube.com/watch?v=P8Ozdqzjigg> until the day after
> > christmas
> > at which point this becomes a really bad recommendation. To some degree,
> > this can be partially dealt with by using temporal tags as indicators,
>
ormal recommendations—so you can ask for hot in “electronics” if you know
> categories, or hot "in-stock" items, or ...
>
> Still anomaly detection does sound like an interesting approach.
>
>
> On Nov 10, 2017, at 3:13 PM, Johannes Schulte <johannes.schu...@gmail.com>
Hi "all",
I am wondering what would be the best way to incorporate event time
information into the calculation of the G-Test.
There is a claim here
https://de.slideshare.net/tdunning/finding-changes-in-real-data
saying "Time aware variant of G-Test is possible"
I remember i experimented with
good.
The step for finding labels is still unclear to me. You use the
Loglikelihood class on the original documents? How? Or do you mean the
collocation job?
Cheers,
Frank
On Thu, Mar 20, 2014 at 8:39 PM, Johannes Schulte
johannes.schu...@gmail.com wrote:
Hi Frank, we are using
Hi Frank, we are using a very similar system in production.
Hashing text like data to a 5 dimensional vector with two probes, and
then applying tf-idf weighting.
For IDF we dont keep a separate weight dictionary but just count the
distinct training examples (documents) that have a non null
1I would pass the memory parameters in the args array directly. The hadoop
specific arguments must come before your custom arguments, so like this
String[] args = new String[]{-Dmapreduce.map.memory.mb=12323,customOpt1
ToolRunner.run(..args)
The tool runner takes care of putting the hadoop
Hi Frank,
you are using the feature vector encoders which hash a combination of
feature name and feature value to 2 (default) locations in the vector. The
vector size you configured is 11 and this is imo very small to the possible
combination of values you have for your data (education, marital,
Hey,
since you are already using basket analysis terms like support, confidence
and lift it might be easier for you to think of the llr score as a better
lift since it automatically puts a penalty on seldom items (you usually
use support in classic mba for that).
So, you would use the same 4
we have a cross recommender in production for about 3 month now, with the
difference that we use lucene to build indices from map reduce directly
plus we do the same thing for 30+ customers, most of them with different
input data structure (field names, values).
we had something similar before
hi,
we are just keeping them in hdfs, one directory with timestamp per model
and a meta file gathering some metrics like AUC, number of training
examples, class distribution. This makes it easy to generate reports out of
it on the fly, why this would be very hard with git (plus there is no added
Hi,
right now the only way to use the encoders without Strings is with a byte
array. Wouldn't it be helpful to allow to pass in offset and length for use
cases where there's a reusable byte array at hand? There's a part of MIA
devoted to speeding up the encoding and i think this would be a
Dunning ted.dunn...@gmail.com wrote:
Johannes,
Your summary is good.
I would add that the precalculated recommendations can be large enough that
the lookup becomes more expensive. Your point about staleness is very
on-point.
On Mon, May 20, 2013 at 10:15 PM, Johannes Schulte
in a real business, you are very lucky. The
search engine approach handles (b) and (c) by nature which massively
improves the likelihood of ever getting to examine (d).
On Tue, May 21, 2013 at 1:13 AM, Johannes Schulte
johannes.schu...@gmail.com wrote:
Thanks! Could you also add how to learn
I think Pat is just saying that
time(history_lookup) (1) + time (recommendation_calculation) (2)
time(precalc_lookop) (3)
since 1 and 3 are assumed to be served by the same system class (key value
store, db) with a single key and 2 0.
ed is using a lot of information that is available at
Hi!
As a starting point I remember this conversation containing both elements
(although the reconstruction part is rather small, hint!)
http://markmail.org/message/5cfewal3oyt6vw2k
On Tue, May 7, 2013 at 1:00 AM, Dominik Hübner cont...@dhuebner.com wrote:
One more thing for now @Ted:
What do
Hi Martin,
i guess you should be fine with the StaticWordValueEncoder , following e.g.
this discussion on this list, it is about clustering but matches some of
your questions
Hi,
this worked for me without having to fiddle with map reduce classes
ListCluster initialClusters = new ArrayListCluster();
IterableVector dataPoints = Lists.newArrayList();
ClusterClassifier prior =
new ClusterClassifier(initialClusters,
dataPoints can be in memory or from disk, and you can sample the dataPoints
for initialClusters.
On Tue, Apr 9, 2013 at 6:16 PM, Johannes Schulte johannes.schu...@gmail.com
wrote:
Hi,
this worked for me without having to fiddle with map reduce classes
ListCluster initialClusters = new
Hi,
the score is the probability of the example belonging to the class but
under independence assumptions and hence only useful to compare scores of
different classes with each other (..more likely than..). Since it is meant
to be a probability, it can range from 0 to 1.
If you want to transform
the performance?
Thanks for all the input!
On Mon, Feb 11, 2013 at 7:20 AM, Ted Dunning ted.dunn...@gmail.com wrote:
On Sun, Feb 10, 2013 at 3:39 PM, Johannes Schulte
johannes.schu...@gmail.com wrote:
...
i am currently implementing a system of the same kind, LLR sparsified
term-cooccurrence
:27 PM, Ken Krugler kkrugler_li...@transpac.comwrote:
On Feb 11, 2013, at 1:57am, Johannes Schulte wrote:
@Ken
Thanks for the hints...
I am coming from a payload based system so I am aware if them, however in
the lucene 3.6 branch boosting and payloads didn't work together (if you
set
Hi,
i am currently implementing a system of the same kind, LLR sparsified
term-cooccurrence vectors in lucene (since not a day goes by where i see
Ted praising this).
There are not only views and purchases, but also search terms, facets and a
lot more textual information to be included in the
Hi Pavel,
first of all i would include an intercept term in the model. This learns
the proportion of examples in the training set.
Second, for getting calibrated probabilities out of the downsampled
model, I can think of two ways:
1. Use another set of input data to measure the observed maximum
Oops, hit enter to early...
Just wanted to say that those are the two ways I'm thinking of right now
since i got a similar challenge. I'm thankful for any suggestions or
comments.
Cheers,
Johannes
On Thu, Dec 27, 2012 at 3:13 PM, Johannes Schulte
johannes.schu...@gmail.com wrote:
Hi Pavel
Hi Florents,
it just became different but still works without hdfs, i also had trouble
getting the right classes together but here is something that will
hopefully work correctly:
DistanceMeasure measure = new CosineDistanceMeasure();
// ClusterUtils is no mahout class
ListCluster
, payloads (as of a while ago) were not accessed very efficiently.
This can massively slow down scoring.
On Mon, Nov 5, 2012 at 7:01 AM, shubham srivastava shubha...@gmail.com
wrote:
http://sujitpal.blogspot.in/2011/01/payloads-with-solr.html
On Fri, Nov 2, 2012 at 12:13 PM, Johannes
:
On Mon, Nov 5, 2012 at 12:06 PM, Johannes Schulte
johannes.schu...@gmail.com wrote:
do you really mean payloads? Because i consider them part of the index as
they are stored per position and can be accessed during scoring.
I had the impression that they were not indexed
with your
situation.)
Sean
On Fri, Oct 5, 2012 at 10:44 AM, Johannes Schulte
johannes.schu...@gmail.com wrote:
Hi!
I got a question concerning a recommendation / classification problem
which
i originally wanted to solve with matrix factorization methods from
taste /
mahout
recommendations to avoid solving the reverse
problem.
On Fri, Oct 5, 2012 at 12:42 PM, Johannes Schulte
johannes.schu...@gmail.com wrote:
Sean,
thanks for your input.
It's more like 30 million users + id mapping for both items and users,
but
i could probably sample that to something that fits
31 matches
Mail list logo