Hi Stephen,
We precompute a variant of P(z,d) during indexing, and do the first 3
steps. The resulting documents are ordered by payload score, which is
basically z in our case. We don't currently care about P(t,z) but it
seems like a good thing to have for disambiguation purposes.
So anyway, I ha
Sujit,
Thanks for your reply, and the link to your blog post, which was
helpful and got me thinking about Payloads.
I still have one more question. I need to be able to compute the
Sim(query q, doc d) similarity function, which is defined below:
Sim (query q, doc d) = sum_{t in q} sum_{z} P(t, z
Hi Stephen,
We are doing something similar, and we store as a multifield with each
document as (d,z) pairs where we store the z's (scores) as payloads for
each d (topic). We have had to build a custom similarity which
implements the scorePayload function. So to find docs for a given d
(topic), we
List,
I am trying to incorporate the Latent Dirichlet Allocation (LDA) topic
model into Lucene. Briefly, the LDA model extracts topics
(distribution over words) from a set of documents, and then represents
each document with topic vectors. For example, documents could be
represented as:
d1 = (0,