HMM

2013-01-05 Thread simon.2.thompson
Dr. Simon Thompson Chief Researcher, Customer Experience. BT Research. BT plc. PP11J. MLBG BT Adastral Park, Martlesham Heath. IP5 3RE Note : This email contains BT information, which may be privileged or confidential. It's meant only for the individual(s) or entity named

HMM - baum welch and hmmpredict

2013-01-05 Thread simon.2.thompson
Hi there, I've got a couple of questions about the hmm elements of Mahout. - when I get models that are made of NaN I guess this is telling me that the algorithm can't make a prediction? - I can train models with 1 hidden state, or 2 hidden states and once or twice with 3 hidden states.. but

RE: HMM - baum welch and hmmpredict

2013-01-06 Thread simon.2.thompson
Hi Ted, thanks very much for the response, very helpful to hear these thoughts. What I will do is look at the data set issue and report back as to what I find out. I'll prod round the code and see if I can get a clue as to how it produces infinities and so on. I think that one of the Mahout

RE: HMM - baum welch and hmmpredict

2013-01-06 Thread simon.2.thompson
Hi, I've been using the standalone trainer. I'll have a look at the log scaled trainers - thanks for the tip! Best, Simon Dr. Simon Thompson Chief Researcher, Customer Experience. BT Research. BT plc. PP11J. MLBG BT Adastral Park, Martlesham Heath. IP5 3RE Note : This email contains

RE: HMM - baum welch and hmmpredict

2013-01-07 Thread simon.2.thompson
Hi there, Ted's insight on the synthetic data set causing underflow appears to be correct. - If I train using a pattern "0 0 0 0 1 1 1 1 2 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2 2 <>" for 3 hidden states and 3 observable states . >mahout baumwelch -i pattern,txt -o out.txt -nh 3 -no 3 I get

RE: HMM - baum welch and hmmpredict

2013-01-08 Thread simon.2.thompson
Hello there, I've had a little look at the Mahout source. I've been using the mahout command line to train my hmm and this invokes the org.apache.mahout.classifer.sequencelearning.hmm.BaumWelchTrainer class main method. This class then parses the command line options and calls HmmTrainer. t

RE: factorize-movielens-1M.sh privilegedActionException: reports dir doesn't exist when it does exist

2013-01-18 Thread simon.2.thompson
Hello - if you are running on top of hdfs then did you use hadoop fs -put to create the /tmp/mahout-work-kali/movielens/ratings.csv file? Dr. Simon Thompson Note : This email contains BT information, which may be privileged or confidential. It's meant only for the individual(s) or entity

Odd clustering fail

2013-03-09 Thread simon.2.thompson
Hi there, I am doing a fairly silly experiment to measure hadoop performance. As part of this I have extracted emails from the Enron database and I am clustering them using a proprietary method for clustering short messages (ie. tweets, emails, sms's) and benchmarking clusters in various confi

RE: Odd clustering fail

2013-03-14 Thread simon.2.thompson
Hi Ted & Dan, thanks for coming back to me. I figured it out - I was being dumb! The short version is that I had a large file (a database results dump) that I was splitting with awk in a directory called enron-in. On this occasion I had forgotten to remove it before running "mahout seqdirect

Re: evaluating recommender with boolean prefs

2013-06-07 Thread simon.2.thompson
But why would she want the things she has? - Original Message - From: Koobas [mailto:koo...@gmail.com] Sent: Friday, June 07, 2013 08:06 PM To: user@mahout.apache.org Subject: Re: evaluating recommender with boolean prefs Since I am primarily an HPC person, probably a naive question from

RE: churn analysis

2013-07-24 Thread simon.2.thompson
I've not used Mahout to do it, but in the past colleagues have used HMM to create a way for discovering customers who are in an "about to churn" state, this was used to populate a target list for winback intervention (they're about to curn, contact them and offer something - or just help - to ke

RE: Naive bayes and character n-grams

2013-10-10 Thread simon.2.thompson
Hey Dean, what do you mean by character n-grams? If you mean things like "&ab" or "ui2" then given that there are so few characters compared to words is there a problem that can't be solved without a look-up table for n4 ish because if so then do you run into the issue of a sudden space explosi

RE: travelling salesman on Mahout

2014-01-16 Thread simon.2.thompson
Hi all - there is a project at MIT called FlexGP that has done more work on this. http://groups.csail.mit.edu/EVO-DesignOpt/groupWebSite/index.php?n=Site.FlexGP Unfortunately I can't find a download for the code so I suppose that it's not opensource, however you might like to contact these fol