RE: clustering with kmeans, java app

2012-08-07 Thread Videnova, Svetlana
On Thu, Aug 2, 2012 at 11:57 AM, Videnova, Svetlana svetlana.viden...@logica.com wrote: Hello, I’m doing java app for clustering my data with kmeans. Those are the steps: 1) LuceneDemo : Create index and vectors using lib Lucene.vector, input path of my .txt, output index (segments_1

RE: ClusterDumper eclipse human readable output kmeans

2012-08-07 Thread Videnova, Svetlana
AM, Videnova, Svetlana svetlana.viden...@logica.com wrote: Hi, My goal is to transform the vectors created by lucene.vector (thanks to kmeans clustering) to a human readable format. For that I am using ClusterDumper function on eclipse. But that code does not generate none files. What

RE: ClusterDumper eclipse human readable output kmeans

2012-08-07 Thread Videnova, Svetlana
-08-2012 12:50, Videnova, Svetlana wrote: I already generated points directory when i run cluster (kmeans in my case). But for the moment I can't generate clustedump because of error on this line: ClusterDumper.readPoints(new Path(output/kmeans/clusters-0), 2, conf); Second parameter is double

clustering with kmeans, java app

2012-08-02 Thread Videnova, Svetlana
Hello, I’m doing java app for clustering my data with kmeans. Those are the steps: 1) LuceneDemo : Create index and vectors using lib Lucene.vector, input path of my .txt, output index (segments_1, segments.gen, .fdt, .fdx, .fnm, .frq, .nrm, .prx, .tii, .tis, .tvd, .tvx and the most

kmeans cluster V:0.7

2012-08-02 Thread Videnova, Svetlana
Hello, I'm using mahout 0.7 and trying to clusterise, but apparently there is no more KMeansClusterer class available in 0.7. Can somebody please tell me by which class kmeansclusterer is replaced? Thank you Think green - keep it on the screen. This e-mail and any attachment is for

RE: kmeans cluster V:0.7

2012-08-02 Thread Videnova, Svetlana
and generic way. KMeansDriver's run method is all you need to use KMeans Clustering. On 02-08-2012 15:25, Videnova, Svetlana wrote: Hello, I'm using mahout 0.7 and trying to clusterise, but apparently there is no more KMeansClusterer class available in 0.7. Can somebody please tell me by which

mahout lib : permissions

2012-07-31 Thread Videnova, Svetlana
Hi mahouters, I am trying to use the mahout lib with my app java. But while I try to clusterize calling this: DocumentProcessor.tokenizeDocuments(new Path(inputDir),analyzer.getClass().asSubclass(Analyzer.class), tokenizedPath, conf); And this: InputDriver.runJob(new Path(inputDir),

RE: 回复:mahout lib : permissions

2012-07-31 Thread Videnova, Svetlana
cluster? please copy you code and try to run it on hadoop cluseter. -- 原始邮件 -- 发件人: Videnova, Svetlana; 发送时间: 2012年7月31日(星期二) 下午3:27 收件人: user@mahout.apache.org; 主题: mahout lib : permissions Hi mahouters, I am trying to use the mahout lib with my app

RE: 回复:mahout lib : permissions

2012-07-31 Thread Videnova, Svetlana
I'm using cygwin. Permissions problems was beacause I wasn’t using aparantly cygwin. Thanks all. But I have still this error. What about jobs problems? Exception in thread main java.lang.IllegalStateException: Job failed! at

RE: 回复:mahout lib : permissions

2012-07-31 Thread Videnova, Svetlana
. This should work. I am still not sure why it doesn't work with the direct download version.. Thanks, Kiran On Tue, Jul 31, 2012 at 8:30 AM, Videnova, Svetlana svetlana.viden...@logica.com wrote: I'm using cygwin. Permissions problems was beacause I wasn’t using aparantly cygwin. Thanks all

RE: .txt to vector

2012-07-26 Thread Videnova, Svetlana
artichokes 14 0 cheese 17 1 deron 14 2 french 14 3 fries 14 4 hamburger 14 5 nicole 17 6 salad 17 7 steak 14 8 -Message d'origine- De : Videnova, Svetlana [mailto:svetlana.viden...@logica.com] Envoyé : mercredi 25 juillet

mahout streaming

2012-07-26 Thread Videnova, Svetlana
Hi everybody, Is it possible instead of creating a vector from txt or lucene index creating vector from streaming (looking like xml)? stream example: response lst name=responseHeader int name=status0/int int name=QTime16/int lst name=params str name=indenton/str str name=start0/str str

RE: .txt to vector

2012-07-25 Thread Videnova, Svetlana
. This percentage is expressed as a value between 0 and 1. The default is 0. You want .3, not 30 ! On Tue, Jul 24, 2012 at 1:27 AM, Videnova, Svetlana

RE: .txt to vector

2012-07-25 Thread Videnova, Svetlana
term vectors. http://code.google.com/p/luke/ It uses Swing, so you need the index on your local PC. On Wed, Jul 25, 2012 at 12:15 AM, Videnova, Svetlana svetlana.viden...@logica.com wrote: Yes i saw the help thats why I was trying with something between 0 and 1 but I have all the time

RE: .txt to vector

2012-07-25 Thread Videnova, Svetlana
to vector It is a jar file, so just java -jar luke.jar But, there's a problem. Luke releases are keyed to different Lucene releases. You need the right Luke download for your version of Lucene. http://code.google.com/p/luke/downloads/list On Wed, Jul 25, 2012 at 12:52 AM, Videnova, Svetlana

RE: .txt to vector

2012-07-24 Thread Videnova, Svetlana
in the index with the suffix .tvf. This has the data which the Mahout lucene job looks for. On Mon, Jul 23, 2012 at 8:03 AM, Videnova, Svetlana svetlana.viden...@logica.com wrote: Hello again, I have got my indexed files from solr in windows and copy them into a directory in ubuntu

RE: .txt to vector

2012-07-24 Thread Videnova, Svetlana
java.lang.IllegalArgumentException -Message d'origine- De : Videnova, Svetlana [mailto:svetlana.viden...@logica.com] Envoyé : mardi 24 juillet 2012 09:16 À : user@mahout.apache.org Objet : RE: .txt to vector Hi Lance, My dir contains now : _0.tvf and the others. With the command: apache-mahout-d6d6ee8

RE: .txt to vector

2012-07-23 Thread Videnova, Svetlana
/TermVectorComponent I don't know if lucene.vector is in the Mahout 0.5 release. For cluster outputs, the current cluster dumper supports 'graphml' format. Giraph is an interactive graph browsers. You can look at small cluster jobs. On Thu, Jul 19, 2012 at 11:34 PM, Videnova, Svetlana svetlana.viden

RE: .txt to vector

2012-07-23 Thread Videnova, Svetlana
 : Videnova, Svetlana [mailto:svetlana.viden...@logica.com] Envoyé : lundi 23 juillet 2012 10:18 À : user@mahout.apache.org Objet : RE: .txt to vector I'm using mahout on ubuntu and solr on windows i guess with a web service I can get the indexed files from solr and then thanks to java program In the web

RE: .txt to vector

2012-07-20 Thread Videnova, Svetlana
programs in Mahout, and otherwise covers other text processing problems. Mahout in Action is very good, and can help you use most of the Mahout features. http://www.manning.com/owen http://www.manning.com/ingersoll On Thu, Jul 19, 2012 at 8:08 AM, Videnova, Svetlana svetlana.viden...@logica.com

RE: k-means output missing some cluster centers coordinates

2012-07-20 Thread Videnova, Svetlana
That's a very good question, I was expecting an answer too... That was the answer giver to me from mahout users: the type of input and output depends on the job you want to run. I was clustering .txt files for the moment. -Message d'origine- De : shriram [mailto:ghai12...@gmail.com]

RE: .txt to vector

2012-07-19 Thread Videnova, Svetlana
at 6:04 PM, Videnova, Svetlana svetlana.viden...@logica.com wrote: I'm working with mahout. I'm trying to do web service in java by myself who will take the output of solr and give this file to mahout. For the moment I successfully do the recommendation part. Now I'm trying to clusterise

RE: .txt to vector

2012-07-19 Thread Videnova, Svetlana
Objet : Re: .txt to vector Yes, the Mahout analyzer would have to be updated for Lucene 4.0. I suggest using an earlier one. Mahout uses with Lucene in a very simple way, and it is OK to use any earlier Lucene from 3.1 to 3.6. On Wed, Jul 18, 2012 at 11:50 PM, Videnova, Svetlana svetlana.viden

RE: .txt to vector

2012-07-19 Thread Videnova, Svetlana
file:/usr/local/apache-mahout-d6d6ee8/examples/output/clusters-8/data does not exist. Best Regards Alexander Aristov On 19 July 2012 12:30, Videnova, Svetlana svetlana.viden...@logica.comwrote: Hi Lance, Thank you for your fast answer. I was changing my : CLASSPATH=/opt/lucene-3.6.0/lucene

RE: .txt to vector

2012-07-19 Thread Videnova, Svetlana
-vectors tokenized-documents How should the vectors files looking like? And can somebody please explain me what represents each directory of the output above? Thank you -Message d'origine- De : Videnova, Svetlana [mailto:svetlana.viden...@logica.com] Envoyé : jeudi 19 juillet 2012 14

RE: .txt to vector

2012-07-19 Thread Videnova, Svetlana
is the chunk-0 file exactly? What represent clusters-dump at the end created by using the command clusterdump? Thank you all! -Message d'origine- De : Videnova, Svetlana [mailto:svetlana.viden...@logica.com] Envoyé : jeudi 19 juillet 2012 15:07 À : user@mahout.apache.org Objet : RE: .txt

.txt to vector

2012-07-18 Thread Videnova, Svetlana
I'm working with mahout. I'm trying to do web service in java by myself who will take the output of solr and give this file to mahout. For the moment I successfully do the recommendation part. Now I'm trying to clusterise. For this I have to vectorise the output of solr. Do you have any idea how

general mahout working / some solr questions / last version tests

2012-07-06 Thread Videnova, Svetlana
Memory: 67M/170M :):):):):):):):):):) Then thanks to : Sean Owen and his updates on http://zoekja.nl/proxy/index.php?q=aHR0cHM6Ly9naXRodWIuY29tL2FwYWNoZS9tYWhvdXQ%3D -Message d'origine- De : Videnova, Svetlana [mailto:svetlana.viden...@logica.com] Envoyé : vendredi

Mahout org.apache.mahout.cf.taste.* / Maven pom.xml

2012-06-22 Thread Videnova, Svetlana
Hello, I'm trying to run the first example of ch02 of Mahout in action. I have got following errors. Did I have to create the pom.xml. If yes: What I have to put in? Where I have to put it? If no: Where can I find it? Cause apparently maven did not find it. Where can I find taste files of