Editing Dictionary Vector Generated

2013-10-04 Thread Puneet Arora
Hello All, I am currently working on sentimental analysis of social media where in I am using mahout for vectors generation using bigrams, but while classifying them under the category some of the unigrams which I dont want are also coming. Like I classified anti English as negative now in

Using Mahout 0.8 hadoop-based recommenders with EMR

2013-10-04 Thread Adam Warski
Hello, I'm trying to run the hadoop-based recommender job (org.apache.mahout.cf.taste.hadoop.item.RecommenderJob) from Mahout 0.8 on EMR. I'm using the Amazon Distribution Hadoop, which is version 1.0.3. Locally running the job with that version works just fine - I get the expected output. On

Re: Editing Dictionary Vector Generated

2013-10-04 Thread Ted Dunning
Why do you say that this is unacceptable? If the phrase is the most common way that the word English is used, this isn't such a bad thing. In general, with machine learning, the idea is to let the data speak. If the data say something you don't like, you have to be careful about

Re: Editing Dictionary Vector Generated

2013-10-04 Thread Puneet Arora
Thank you Sir for your reply. yes you guessed correct that I am using naive bayes, but how can I handle this type of problem. Rather then switching to any other algorithm With Regards On Fri, Oct 4, 2013 at 4:21 PM, Ted Dunning ted.dunn...@gmail.com wrote: Why do you say that this is

Re: Using Mahout 0.8 hadoop-based recommenders with EMR

2013-10-04 Thread Ken Krugler
Hi Adam, On Oct 4, 2013, at 4:38am, Adam Warski wrote: Hello, I'm trying to run the hadoop-based recommender job (org.apache.mahout.cf.taste.hadoop.item.RecommenderJob) from Mahout 0.8 on EMR. I'm using the Amazon Distribution Hadoop, which is version 1.0.3. Locally running the job

Re: What are the best settings for my clustering task

2013-10-04 Thread Ted Dunning
What you are seeing here are the cluster centroids themselves, not the cluster assignments. Streaming k-means is a single pass algorithm to derive these centroids. Typically, the next step is to cluster these centroids using ball k-means. *Those* results can then be applied back to the

Re: Editing Dictionary Vector Generated

2013-10-04 Thread Ted Dunning
On Fri, Oct 4, 2013 at 6:13 AM, Puneet Arora arorapuneet2...@gmail.comwrote: yes you guessed correct that I am using naive bayes, but how can I handle this type of problem. I didn't hear about a problem. You said you didn't like weights on words like English to reflect the fact that they