date:20120518

Re: How to approach this? Classification vs Recommendation

2012-05-18 Thread Ted Dunning

Not so trivially, these classifiers can help each other. What you have is a form of transduction or example based learnng. On Fri, May 18, 2012 at 5:24 PM, Sean Owen wrote: > Trivially it's four classifiers. You have just one input here, and > it's binary. That seems like too little info to dis

Re: How to approach this? Classification vs Recommendation

2012-05-18 Thread Sean Owen

Trivially it's four classifiers. You have just one input here, and it's binary. That seems like too little info to discriminate on. All you can learn -- and it doesn't really need a classifier algorithm -- is there's an x% chance of encountering problem a if funded, and (100-x)% of a if not. On Fr

Re: Help with running taste-demo on mahout-examples-0.7-SNAPSHOT.jar

2012-05-18 Thread Sean Owen

Yes it's just like making any other servlet-based app. You can find the servlet (and Axis JWS file if you want) and web.xml in the project. Just put it together with compiled code in a .war file. On Fri, May 18, 2012 at 10:09 PM, Dhananjay Sampath wrote: > Thanks Sean, for the super quick respons

How to approach this? Classification vs Recommendation

2012-05-18 Thread fht

Hi, I suppose this a combination of a generic machine learning question and a mahout question. I have a data set. A user may or may not be part of a funded scheme. If there are not part of the funded scheme they might be susceptible to certain problems a, b, c and d. If there are part of the fu

Re: Help with running taste-demo on mahout-examples-0.7-SNAPSHOT.jar

2012-05-18 Thread Dhananjay Sampath

Thanks Sean, for the super quick response! I wish I asked this question 3 days ago ! Ok, so I have to package my own recommender. Got it. The only place where I found some instructions on packaging a (custom/example) recommender is Manuel's response to Ben on the User archives ( http://mail-archiv

NoClassDefFoundError calling custom analyzer in seq2sparse

2012-05-18 Thread DAN HELM

Hello, I'm sure variations of this question have been posted before but I'm having trouble using my own custom analyzer with seq2sparse. I'm using the -a parameter to pass my class name. To build the class I basically cloned the concept in Mahout's org.apache.mahout.vectorizer.DefaultAnalyz

Re: Help with running taste-demo on mahout-examples-0.7-SNAPSHOT.jar

2012-05-18 Thread Sean Owen

(You can use -DskipTests in Maven. You don't need to run the very lengthy tests.) The bad news is that this example only worked in 0.5, and was removed in 0.6. The underlying pieces are still there, you just would have to assemble the WAR yourself. I'll try to figure out how to remove this; I did

Help with running taste-demo on mahout-examples-0.7-SNAPSHOT.jar

2012-05-18 Thread Dhananjay Sampath

Hi mahout-devs/mahout-users, I am here stuck on trying to get the mahout taste app demo working and could really use some help. I know that several people have come before me asking the same question and I have gone through almost all of them and have met with little success. I followed all of th

Re: tokenizer for text

2012-05-18 Thread Jiaan Zeng

very helpful info! Thanks a lot. On Fri, May 18, 2012 at 11:37 AM, John Conwell wrote: > Noise in OCR often manifests itself as a whole bunch of singletons in the > corpus of meaningless terms like "lsdjfdslkfj". So the minFrequency flag > can help in filtering out these terms. > > Stopwords sho

Re: Judging the quality of clustering

2012-05-18 Thread Pat Ferrel

Thanks Jeff. When I did my experiment it used kmeans for three runs k = 10, 20, 10. Number of documents around 3000 (guessing here). The k=10 run did not prune, k=30 pruned 4 clusters. I'll run this again to see if it is repeatable and you are welcome to the dataset. I read that comment but w

Re: tokenizer for text

2012-05-18 Thread John Conwell

Noise in OCR often manifests itself as a whole bunch of singletons in the corpus of meaningless terms like "lsdjfdslkfj". So the minFrequency flag can help in filtering out these terms. Stopwords should be handled by tfidf. For example the word "the" probably has a high frequency in every docume

LDA, printing Topics

2012-05-18 Thread Simon Handley

I'm trying to understand how LDA prints out the words per topic. If I run the reuters example, the topics are printed out like this: Topic 0 > === > dlrs [p(dlrs|topic_0) = 0.09982075792238235 > mln [p(mln|topic_0) = 0.05160370562850524 > its [p(its|topic_0) = 0.026424106119119467 > earni

Re: tokenizer for text

2012-05-18 Thread Jiaan Zeng

Thanks for the quick reply. Stop word filtering or stemming may not help much I think. Too, the point of using tf-idf vector is to deal with high occurrence frequency word. Stop word filtering or stemming seems counter against the tf-idf intention. The problem is that the text has lots of noises (

Re: tokenizer for text

2012-05-18 Thread Baoqiang Cao

In addition. You could try to increase the word occurance thresholds in -s and -md options. On Fri, May 18, 2012 at 9:41 AM, John Conwell wrote: > What do you have in mind as far as a different tokenizer? Are you doing > stopword filtering? Maybe look at the stopword list and see if there are >

Re: tokenizer for text

2012-05-18 Thread John Conwell

What do you have in mind as far as a different tokenizer? Are you doing stopword filtering? Maybe look at the stopword list and see if there are other noise words you wish to add. If you are using Lucene to filter stopwords, its stopword list if pretty small(20 or so words). Stemming is another

tokenizer for text

2012-05-18 Thread Jiaan Zeng

Hi List, I am trying to use Mahout to do cluster on text. The problem is after running the procedure SparseVectorsFromSequenceFiles, the dimension of tf-idf vector is too high (about 50K) and it increases as the number of document increases. I think there are two ways to handle that. One is to use

Re: How to approach this? Classification vs Recommendation

Re: How to approach this? Classification vs Recommendation

Re: Help with running taste-demo on mahout-examples-0.7-SNAPSHOT.jar

How to approach this? Classification vs Recommendation

Re: Help with running taste-demo on mahout-examples-0.7-SNAPSHOT.jar

NoClassDefFoundError calling custom analyzer in seq2sparse

Re: Help with running taste-demo on mahout-examples-0.7-SNAPSHOT.jar

Help with running taste-demo on mahout-examples-0.7-SNAPSHOT.jar

Re: tokenizer for text

Re: Judging the quality of clustering

Re: tokenizer for text

LDA, printing Topics

Re: tokenizer for text

Re: tokenizer for text

Re: tokenizer for text

tokenizer for text

16 matches

Site Navigation

Mail list logo

Footer information