Your input is still text though, and I assume your'e trying to use TextInputFormat. You can't do this as it expects an IntWritable, and that means it expects input as a sequence file, via SequenceFileInputFormat.
On Tue, Mar 6, 2012 at 7:21 PM, PEDRO MANUEL JIMENEZ RODRIGUEZ <pmjimenez1...@hotmail.com> wrote: > > Thanks for reply. > > I was doing something wrong. I have to convert my input file to a seqFile. > Now I'm trying to convert it. > > The file looks like: > > 2323.03 994.45 87..... > 56.45 76.21 275.1 12.456...... > ...... > > Each line represents a matrix row. And each column is separated by space. > > > So I executed the following command to get the seqFile > > bin/mahout seqdirectory -i /home/pedro/input -o /home/pedro/diffuse/output > -c UTF-8 > > And I try to run my program whith the generated file. Getting the following > error: > > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to > org.apache.hadoop.io.IntWritable > at > org.apache.mahout.math.hadoop.TransposeJob$TransposeMapper.map(TransposeJob.java:100) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371) > at org.apache.hadoop.mapred.Child$4.run(Child.java:259) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > at org.apache.hadoop.mapred.Child.main(Child.java:253) > > Do I have to change the input file to another format? > > Thanks. > > > >> Date: Sun, 4 Mar 2012 17:48:56 -0800 >> Subject: Re: DistributedRowMatrix - FileNotFoundException >> From: goks...@gmail.com >> To: user@mahout.apache.org >> >> This could be a problem with the DRM code or HDFS management. Try >> running it without HDFS or a Hadoop cluster, with local files and in >> pseudo-distributed mode. This way you can narrow the problem to one of >> the above. >> >> On Sat, Mar 3, 2012 at 10:13 AM, PEDRO MANUEL JIMENEZ RODRIGUEZ >> <pmjimenez1...@hotmail.com> wrote: >> > >> > Hi everyone! >> > >> > I'm trying to use DistributedRowMatrix in my class code but I'm getting >> > the same error all the time: "FileNotFoundException" >> > >> > I have put a file in my hdfs directory under /user/hduser/diffuse. And I >> > run the progam with "diffuse" as input and output directory. The code >> > looks like: >> > >> > Configuration originalConfig = getConf(); >> > DistributedRowMatrix matrix = new >> > DistributedRowMatrix(inputPath, >> > outputPath, >> > numRows, >> > numCols); >> > >> > JobConf conf = new JobConf(originalConfig); >> > matrix.configure(conf); >> > >> > DistributedRowMatrix t1 = matrix.transpose(); >> > >> > 12/03/03 18:55:13 WARN mapred.JobClient: Use GenericOptionsParser for >> > parsing the arguments. Applications should implement Tool for the same. >> > 12/03/03 18:55:14 INFO mapred.FileInputFormat: Total input paths to >> > process : 7 >> > 12/03/03 18:55:14 INFO mapred.JobClient: Cleaning up the staging area >> > hdfs://localhost:54310/app/hadoop/tmp/mapred/staging/hduser/.staging/job_201203031751_0007 >> > Exception in thread "main" java.io.FileNotFoundException: File does not >> > exist: hdfs://localhost:54310/user/hduser/diffuse/7476429391099/data >> > at >> > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:525) >> > at >> > org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:51) >> > at >> > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:211) >> > at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:929) >> > at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:921) >> > at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170) >> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:838) >> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791) >> > at java.security.AccessController.doPrivileged(Native Method) >> > at javax.security.auth.Subject.doAs(Subject.java:396) >> > at >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) >> > at >> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:791) >> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:765) >> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1200) >> > at >> > org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:159) >> > at Distributed.MatrixTransposeJob.run(MatrixTransposeJob.java:51) >> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> > at Distributed.MatrixTransposeJob.main(MatrixTransposeJob.java:58) >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> > at >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> > at >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> > at java.lang.reflect.Method.invoke(Method.java:597) >> > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) >> > >> > What I'm doing wrong? >> > >> > Everytime I try to run the code I'm obtaining a path like this one: >> > >> > FileNotFoundException: File does not exist: >> > hdfs://localhost:54310/user/hduser/diffuse/7476429391099/data >> > >> > Thanks a lot. >> > >> > Pedro. >> > >> >> >> >> -- >> Lance Norskog >> goks...@gmail.com >