Re: Problem converting tokenized documents into TFIDF vectors

2014-01-26 Thread Drew Farris
Scott, Based on the dictionary output, it looks like the processing of generating vector from your tokenized text is not working properly. The only term that's making it into your dictionary is 'java' - everything else is being filtered out. Furthermore, your tf vectors have a single dimension

Re: Problem converting tokenized documents into TFIDF vectors

2014-01-26 Thread Scott C. Cote
Drew, I'm sorry - I'm derelict (as opposed to dirichlet) in responding that I got passed my problem. It was the min freq that was killing me. Forgot about that parameter. Thank you for your assist. Hope to be able to return the favor. Am on the hook to update documentation for Mahout already

Re: Problem converting tokenized documents into TFIDF vectors

2014-01-26 Thread Suneel Marthi
Scott, FYI... 0.9 Release is not official yet. The project trunk's still at 0.9-SNAPSHOT. Please feel free to update the documentation. On Sunday, January 26, 2014 1:34 PM, Scott C. Cote scottcc...@gmail.com wrote: Drew, I'm sorry - I'm derelict (as opposed to dirichlet) in responding

Re: Problem converting tokenized documents into TFIDF vectors

2014-01-26 Thread Scott C. Cote
I understand that it is not official. Am just trying to provide another test opportunity for the .9 release. SCott On 1/26/14 1:05 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Scott, FYI... 0.9 Release is not official yet. The project trunk's still at 0.9-SNAPSHOT. Please feel free to

Problem converting tokenized documents into TFIDF vectors

2014-01-21 Thread Scott C. Cote
All, Not a Mahout .9 problem ­ once I have this working with .8 Mahout, will immediately pull in the .9 stuffŠ.. I am trying to make a small data set work (perhaps it is too small?) where I am clustering skills (phrases). For sake of brevity (my steps are long) , I have not documented the steps