>> -Original Message-
>> From: stephen.warner.tho...@gmail.com
>> [mailto:stephen.warner.tho...@gmail.com] On Behalf Of Stephen Thomas
>> Sent: Tuesday, November 29, 2011 5:20 PM
>> To: java-user@lucene.apache.org
>> Subject: Custom Filter for Splitting CamelCase?
List,
I have written my own CustomAnalyzer, as follows:
public TokenStream tokenStream(String fieldName, Reader reader) {
// TODO: add calls to RemovePuncation, and SplitIdentifiers here
// First, convert to lower case
TokenStream
ote about this sometime back...maybe this would help you.
> http://sujitpal.blogspot.com/2011/01/payloads-with-solr.html
>
> -sujit
>
> On Mon, 2011-11-28 at 12:29 -0500, Stephen Thomas wrote:
>> List,
>>
>> I am trying to incorporate the Latent Dirichlet Allocation
List,
I am trying to incorporate the Latent Dirichlet Allocation (LDA) topic
model into Lucene. Briefly, the LDA model extracts topics
(distribution over words) from a set of documents, and then represents
each document with topic vectors. For example, documents could be
represented as:
d1 = (0,
List,
I am indexing a subset of Wikipedia. I have 4 years worth of data, and
have taken snapshots of each document at each month in the 4 year
span. Thus, I have 4*12=36 versions of each document. (I keep track of
the timestamp in a field.) I have noticed that in many cases, a
Wikipedia document d