I spent a week trying to get Hadoop to work on Windows 7, and then gave up. Do you manage to run Hadoop on Windows? Do Hadoop tests (e.g. wordcount) work? http://en.wikisource.org/wiki/User:Fkorning/Code/Hadoop-on-Cygwin has lots of details about this. Some of the possible problems are cygwin paths (!= linux paths), hdfs/local filesystem confusion, your hadoop user (!= your user permissions-wise), or other things listed at the link above. Good luck, Yuval
On Thu, Aug 2, 2012 at 11:57 AM, Videnova, Svetlana <svetlana.viden...@logica.com> wrote: > > Hello, > > I’m doing java app for clustering my data with kmeans. > > Those are the steps: > > 1) > > LuceneDemo : Create index and vectors using lib Lucene.vector, input path of > my .txt, output index (segments_1, segments.gen, .fdt, .fdx, .fnm, .frq, > .nrm, .prx, .tii, .tis, .tvd, .tvx and the most important who will be using > by mahout .tvf) and vectors looking like that > (SEQ__org.apache.hadoop.io.Text_org.apache.hadoop.io.Text______t€ðàó^æVG²RŸ˜Õ_________Ž__P(0):{15:1.4650986194610596,14:0.9997141361236572,11:0.9997141361236572,10:0.9997141361236572,9:0.9997141361236572,8:1.4650986194610596,7:1.4650986194610596,6:1.4650986194610596,5:0.9997141361236572,4:1.4650986194610596,2:3.1613736152648926,1:1.4650986194610596,0:0.9997141361236572}_________Ž__P(1):{15:1.4650986194610596,14:0.9997141361236572,11:0.9997141361236572,10:0.9997141361236572,9:0.9997141361236572,8:1.4650986194610596,7:1.4650986194610596,6:1.4650986194610596,5:0.9997141361236572,4:1.4650986194610596,2:3.1613736152648926,1:1.4650986194610596,0:0.9997141361236572}_________Ž__P(2):{ > [… and others]) > > Does anyone please can confirm me that the output format looks good? If no, > what the vectors generated by lucene.vector should look like? > > This is part of the code : > /*Creating vectors*/ > Map vectorMap = new TreeMap(); > IndexReader reader = IndexReader.open(index); > int numDoc = reader.maxDoc(); > for(int i = 0; i < numDoc;i++){ > > > TermFreqVector termFreqVector > = reader.getTermFreqVector(i, "content"); > > addTermFreqToMap(vectorMap,termFreqVector); > > } > > > > > 2) > > > MainClass : Create clusters with mahout, input – path of vectors (the vectors > generated by step 1 see above) , output - clusters (looking like : for the > moment does not create any clusters cause of this error : > Exception in thread "main" java.io.FileNotFoundException: File > file:/F:/MAHOUT/TesMahout/clusters/tf-vectors/wordcount/data does not exist. > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) > at > org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:63) > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241) > at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447) > at > org.apache.mahout.vectorizer.tfidf.TFIDFConverter.startDFCounting(TFIDFConverter.java:368) > at > org.apache.mahout.vectorizer.tfidf.TFIDFConverter.calculateDF(TFIDFConverter.java:198) > at main.MainClass.main(MainClass.java:144)) > > > Does anyone please can help me to solve this exception? I can’t understand > why data could not be created… while I’m using hadoop and mahout libs on > windows (and I’m admin so should not be problem of rights). > > > This is part of the code : > > > Pair<Long[], List<Path>> calculate > =TFIDFConverter.calculateDF(new > Path(outputDir,DictionaryVectorizer.DOCUMENT_VECTOR_OUTPUT_FOLDER), new > Path(outputDir, DictionaryVectorizer.DOCUMENT_VECTOR_OUTPUT_FOLDER), conf, > chuckSize); > > TFIDFConverter.processTfIdf(new > Path(outputDir,DictionaryVectorizer.DOCUMENT_VECTOR_OUTPUT_FOLDER), new > Path(outputDir),conf,calculate,minDf,maxDFPercent, norm, true, > sequentialAccessOutput, false, reduceTasks); > > Path vectorFolder = new Path("output"); > Path canopyCentroids = new Path(outputDir, "canopy-centroids"); > > Path clusterOutput = new Path(outputDir, "clusters"); > > CanopyDriver.run(vectorFolder, canopyCentroids, new > EuclideanDistanceMeasure(), 250, 120, false,3,false); > > KMeansDriver.run(conf, vectorFolder, new > Path(canopyCentroids,"clusters-0"), clusterOutput, new > TanimotoDistanceMeasure(), 0.01, 20, true,3, false); > > > Thank you for your time > > > > > Regards > > Think green - keep it on the screen. > > This e-mail and any attachment is for authorised use by the intended > recipient(s) only. It may contain proprietary material, confidential > information and/or be subject to legal privilege. It should not be copied, > disclosed to, retained or used by, any other party. If you are not an > intended recipient then please promptly delete this e-mail and any attachment > and all copies and inform the sender. Thank you. >