Building a reputation analysis engine for email using Mahout

2014-03-20 Thread Dileepa Jayakody
Hi All, My name is Dileepa Jayakody, a MSc research student from University of Moratuwa, Sri Lanka. My research project (ReputationBox) is about prediction the goodness of incoming emails (based on a calculated reputation score) by analysing previous email conversations, email correspondents and t

Re: Building a reputation analysis engine for email using Mahout

2014-03-20 Thread Dileepa Jayakody
Hi All, Can I implement a email reputation prediction module within my application without hosting a Mahout server seperately? Is there a library I can use for implementing recommendation engines? Appreciate any suggestions. Thanks, Dileepa On Thu, Mar 20, 2014 at 4:29 PM, Dileepa Jayakody wr

Re: Text clustering with hashing vector encoders

2014-03-20 Thread Johannes Schulte
Hi Frank, we are using a very similar system in production. Hashing text like data to a 5 dimensional vector with two probes, and then applying tf-idf weighting. For IDF we dont keep a separate weight dictionary but just count the distinct training examples ("documents") that have a non null v

Re: Text clustering with hashing vector encoders

2014-03-20 Thread Ted Dunning
On Thu, Mar 20, 2014 at 12:39 PM, Johannes Schulte < johannes.schu...@gmail.com> wrote: > For representing the cluster we have a separate job that assigns users > ("documents") to clusters and shows the most discriminating words for the > cluster via the LogLikelihood class. The results are then v

Understanding the output of cvb

2014-03-20 Thread Natalia Connolly
Hello, I am using mahout 0.9 and hadoop 1.2.1. I've just run cub on a bunch of documents, and I can output the top N words per topic using vector dump. What I don't understand is how to get the actual topics as strings. When I do something like this: ./bin/mahout vectordump -i doc-topics/pa

Re: Reuters Example LDA Error (no help anywhere)

2014-03-20 Thread Andrew Musselman
Filed a ticket here: https://issues.apache.org/jira/browse/MAHOUT-1470 On Thu, Mar 6, 2014 at 4:36 PM, Cosinus WebDev wrote: > Hi, > > Thank you for the answer, now I can rest a second :) > > Hope this will be fixed soon. If you file a JIRA please send me the link so > I can watch the result. >

market basket analysis of low sales volume products

2014-03-20 Thread Si Chen
Hi everybody, I'd like to do some market basket analysis to suggest cross-sells, but many of the products are very low sales volume items, so in the past the results weren't that useful. Do you think it would make sense to do market basket analysis at more aggregate levels, for example by brand,

Mahout Error ==> Chapter 2- First problem - Recoomender- Java.lang.Error

2014-03-20 Thread Masoud Nikravesh
Exception in thread "main" java.lang.Error: Unresolved compilation problems: DataModel cannot be resolved to a type FileDataModel cannot be resolved to a type UserSimilarity cannot be resolved to a type PearsonCorrelationSimilarity cannot be resolved to a type UserNeighborhood cannot be resolved to

RE: market basket analysis of low sales volume products

2014-03-20 Thread Martin, Nick
I can tell you my experience is that it's absolutely informative to take a look at running the recommendation stuff on things other than items (brands, categories, sub-categories, etc.). If you're in a multi-brand environment it can give you a great view into brand pen by customer groups pretty

Re: market basket analysis of low sales volume products

2014-03-20 Thread Andrew Musselman
Yes that can help, depending on what you're doing. Want to talk more about it? > On Mar 20, 2014, at 5:14 PM, Si Chen wrote: > > Hi everybody, > > I'd like to do some market basket analysis to suggest cross-sells, but many > of the products are very low sales volume items, so in the past the r

Re: Mahout Error ==> Chapter 2- First problem - Recoomender- Java.lang.Error

2014-03-20 Thread Ted Dunning
This looks like your class path is not set up correctly. You can try running with -verbose:class as a command line option and see if the Mahout classes are even being loaded. On Thu, Mar 20, 2014 at 5:30 PM, Masoud Nikravesh wrote: > Exception in thread "main" java.lang.Error: Unresolved compi

Re: market basket analysis of low sales volume products

2014-03-20 Thread Ted Dunning
I have done the equivalent thing with music (moving up from track to album to artist) with very good results. On Thu, Mar 20, 2014 at 5:58 PM, Martin, Nick wrote: > I can tell you my experience is that it's absolutely informative to take a > look at running the recommendation stuff on things o