Re: Visualizing cluster trough command line

2013-12-13 Thread Taner Diler
Hi David, I tried to find a solution to visualize, too. In DisplayClustering example, clustering is running for x,y vectors and easy to visualize. But in real world, we have n vectors. It's not possible to visualize as point in xy chart, I think. On Thu, Dec 12, 2013 at 6:42 PM, David G wrote:

dirichlet clustering with named vectors

2013-10-07 Thread Taner Diler
Hi, Is it possible to get dirichlet clustering result with named vector as in k-means? with named vector in k-means, I can get files located in clusters.

Re: TFIDFConverter generates empty tfidf-vectors

2013-09-22 Thread Taner Diler
, and normPower option set to -1.0f. This applies to > HighDFWordsPruner.pruneVectors, too. > > I believe that solves your problem. > > Best > > Gokhan > > > On Wed, Sep 4, 2013 at 4:54 PM, Taner Diler wrote: > > > Actually, my real motivation was to visualize r

Re: using KmeansDriver with HDFS

2013-09-05 Thread Taner Diler
With 0.8 you can set conf files as Configuration conf = new Configuration(); String HADOOP_HOME = System.getenv("HADOOP_PREFIX"); conf.addResource(new Path(HADOOP_HOME, "conf/core-site.xml")); conf.addResource(new Path(HADOOP_HOME, "conf/hdfs-site.xml")); conf.addR

Re: TFIDFConverter generates empty tfidf-vectors

2013-09-04 Thread Taner Diler
un seq2sparse. I'm gonna debug it anyway. > > And I would like to know how you run the java code. Does your main class > extend AbstractJob to make it "runnable" using bin/mahout? And does it have > a main method that submits your job to your hadoop cluster? Are you us

Re: TFIDFConverter generates empty tfidf-vectors

2013-09-04 Thread Taner Diler
Value: 2 Key: 3: Value: 2 Key: 4: Value: 9 Key: 5: Value: 4 dictionary.file-0 Key class: class org.apache.hadoop.io.Text Value Class: class org.apache.hadoop.io.IntWritable Key: 0: Value: 0 Key: 0.003: Value: 1 Key: 0.006913: Value: 2 Key: 0.007050: Value: 3 Key: 0.01: Value: 4 Key: 0.02: Value: 5 Key: 0.025

Re: TFIDFConverter generates empty tfidf-vectors

2013-09-04 Thread Taner Diler
mahout seq2sparse -i reuters-seqfiles/ -o reuters-kmeans-try -chunk 200 -wt tfidf -s 2 -md 5 -x 95 -ng 2 -ml 50 -n 2 -seq this command works well. Gokhan, I changed minLLR value to 1.0 in java but result is same empty tfidf-vectors. On Tue, Sep 3, 2013 at 10:47 AM, Taner Diler wrote

Re: TFIDFConverter generates empty tfidf-vectors

2013-09-03 Thread Taner Diler
if that works > well? > > On Sun, Sep 1, 2013 at 7:23 PM, Suneel Marthi >wrote: > > > I would first check to see if the input 'seqfiles' for TFIDFGenerator > have > > any meat in them. > > This could also happen if the input seqfiles are empty. > > >

Visualizing Reuters KMeans Clustering

2013-08-31 Thread Taner Diler
Hi all, How can I visualize Reuters KMeans Clustering as in DisplayKMeans.java? Thanks.

TFIDFConverter generates empty tfidf-vectors

2013-08-31 Thread Taner Diler
Hi all, I try to run Reuters KMeans example in Java, but TFIDFComverter generates tfidf-vectors as empty. How can I fix that? private static int minSupport = 2; private static int maxNGramSize = 2; private static float minLLRValue = 50; private static float normPower = 2; priv

Interpreting result of dirichlet clustering

2013-08-25 Thread Taner Diler
Hi all, I try to cluster texts with dirichlet. I have few questions about the result: 1. How can I display data and clusters in a chart like in DisplayDirichlet example. In DisplayDirichlet, sample data has x,y value, It can be displayed. But in TF-IDF result, one file has many word frequency vec

Re: mahout kmeans not generating clusteredPoint dir?

2013-07-29 Thread Taner Diler
After converting reuters sgm files to txt formant in (reuters-extracted), on the first mahout command seqdirectory, you should give input path as file:///your_dir/reuters-extracted. If you give input parameter as /your_dir/reuters-extracted, I got same problem on k-means clustering. On Mon, Jul 2

About to find category of a new article

2013-07-26 Thread Taner Diler
Hi all, I want to be sure about a subject. I've lots of articles about sports, mobile technologies, beverage & food, automotive... When I take a new article, system should tell me that this is about beverage & food. Classification is doing this, am I right? Is there a sample or tutorial about l

No input clusters found exception on execution k-means clustering

2013-07-23 Thread Taner Diler
I'm getting "*No input clusters found in reuters-kmeans-clusters/part-randomSeed. Check your -c argument*" while running k-means example on "mahout in action" sample. I searched on google, but I didnt find a solution. I'm using mahout 0.7 version. How can I run k-means clustering? command: taner