Weighted "features" using naive bayes classifier

2011-07-05 Thread Vijay Santhanam
Hi, I've used Lucene a fair bit and one useful feature it has is the ability to boost fields to make them more relevant. E.g. matching Titles are more important than matching descriptions, so you can "boost" title fields to ensure they weigh in more in the final relevance calculation. I expected

答复: How could I use bayse model with my C++ online classifier

2011-07-05 Thread 刘逸哲
I will re-implementing the serialization in C++ Thanks a lot. -邮件原件- 发件人: Ted Dunning [mailto:ted.dunn...@gmail.com] 发送时间: 2011年7月6日 10:48 收件人: user@mahout.apache.org 主题: Re: How could I use bayse model with my C++ online classifier Well, PMML is the (complicated) standard solution. Ot

Re: How could I use bayse model with my C++ online classifier

2011-07-05 Thread Ted Dunning
Well, PMML is the (complicated) standard solution. Otherwise, a Naive Bayes model would probably fit as CSV data. But seriously, it isn't that hard to read a sequence file. Re-implementing our serialization in C++ would be generally useful as well. On Tue, Jul 5, 2011 at 7:38 PM, Lance Norskog

Re: How could I use bayse model with my C++ online classifier

2011-07-05 Thread Lance Norskog
Is there a standard text format that would support this data? ARFF, for example? On Mon, Jul 4, 2011 at 7:57 PM, beneo_7 wrote: > read the java source code and implemenet it in c++ > > 我也不明白为啥你要用阿里巴巴的邮箱 > > 2011-07-05 > > > > beneo_7 > > > > 发件人: 刘逸哲 > 发送时间: 2011-07-05 10:55 > 主 题: How could I

[JOBS] Meebo Machine Learning Opportunities

2011-07-05 Thread Jim Dullaghan
I'm recruiting Engineers with Machine Learning expertise at Meebo. We have openings for Chief Engineer level through new graduates. We're looking for folks with deep knowledge of how machine learning can be applied in social networking and related advertising applications. We're open to these folk

Re: how do I choose appropriate OnlineLogisticRegression parameters for modelling this?

2011-07-05 Thread Svetlomir Kasabov
Hello, the answer of Vijay's question would be insteresting to me too, since I should use OnlineLogisticRegression in order to calculate probabilities (as far as I see, there are no probability calculation functions in AdaptiveLogisticRegression). So, for example, how to determine 'number of

Re: Tranforming data for k-means analysis

2011-07-05 Thread Ted Dunning
Glad we could help. On Tue, Jul 5, 2011 at 7:09 AM, Radek Maciaszek wrote: > Hello, > > I worked in the past on MSc project which involved quite a lot of Mahout > calculation. I finished it a while ago but only recently got my head around > posting it somewhere online. > > It would be much more d

Re: Using naive bayes classification with continuous, categorical and word-like features

2011-07-05 Thread Ted Dunning
Glancing at the code, I think that the big problem is likely to be the number of features in the encoded model. You only have a tiny number of features in the hashed representation so you are going to have a LOT of collisions. You need to have considerably more dimensions in your encoded feature

Re: Tranforming data for k-means analysis

2011-07-05 Thread Radek Maciaszek
Hello, I worked in the past on MSc project which involved quite a lot of Mahout calculation. I finished it a while ago but only recently got my head around posting it somewhere online. It would be much more difficult to finish this work without the help from this list so I wanted to say thank you

Re: Lanczos SVD scalability

2011-07-05 Thread Jake Mannix
Agreed. This matrix could be decomposed in your browser in javascript ... or these days, on your phone. -jake On Jul 5, 2011 1:12 AM, "Ted Dunning" wrote: Lanczos is probably dominated by overhead and startup costs on such a small matrix. You only have 100,000 non-zreo elements which is a t

Re: Using with seq2spars org.apache.lucene.analysis.Analyzer

2011-07-05 Thread Sean Owen
Erm, yes. What is your question? On Tue, Jul 5, 2011 at 1:30 PM, rmx wrote: > Is this project still alive?? > Please... > Thanks > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Using-with-seq2spars-org-apache-lucene-analysis-Analyzer-tp3108497p3140576.html > Sent from

Re: Using with seq2spars org.apache.lucene.analysis.Analyzer

2011-07-05 Thread rmx
Is this project still alive?? Please... Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Using-with-seq2spars-org-apache-lucene-analysis-Analyzer-tp3108497p3140576.html Sent from the Mahout User List mailing list archive at Nabble.com.

Re: Using naive bayes classification with continuous, categorical and word-like features

2011-07-05 Thread Vijay Santhanam
Hi Ted, I've uploaded my code to https://gist.github.com/1064551 I bought Mahout in Action and am using your ContinuousValueEncoder and other misc classes, but as you can see I've hardcoded most of the training data. Yes, there are very few training samples, but from what I understand, I can rei

Re: Lanczos SVD scalability

2011-07-05 Thread Ted Dunning
Lanczos is probably dominated by overhead and startup costs on such a small matrix. You only have 100,000 non-zreo elements which is a truly tiny problem. Stochastic projection SVD, for instance would compute the answer for such a problem in a few milliseconds. You need a much larger problem to

Re: Using naive bayes classification with continuous, categorical and word-like features

2011-07-05 Thread Ted Dunning
How many training examples do you have? Sounds like you have very few. That is definitely not the sweet spot for on-linear regression. In any case, can you post your test code to github or something? On Mon, Jul 4, 2011 at 11:46 AM, Vijay Santhanam wrote: > Thank you Ted > > However, even with

Re: 20news

2011-07-05 Thread Sean Owen
I committed a change to make the parsing bits I found in .bayes. use space and tab. You can try again. I confess I don't know this code and there's a lot of little pieces of parsing here and there so don't know if this is the heart of the issue. On Mon, Jul 4, 2011 at 4:08 PM, Vijay Santhanam wrot