Re: DistributedRowMatrix - FileNotFoundException

Sean Owen Tue, 06 Mar 2012 11:32:29 -0800

Your input is still text though, and I assume your'e trying to use
TextInputFormat. You can't do this as it expects an IntWritable, and
that means it expects input as a sequence file, via
SequenceFileInputFormat.


On Tue, Mar 6, 2012 at 7:21 PM, PEDRO MANUEL JIMENEZ RODRIGUEZ
<pmjimenez1...@hotmail.com> wrote:
>
> Thanks for reply.
>
> I was doing something wrong. I have to convert my input file to a seqFile. 
> Now I'm trying to convert it.
>
> The file looks like:
>
> 2323.03 994.45 87.....
> 56.45 76.21 275.1 12.456......
> ......
>
> Each line represents a matrix row. And each column is separated by space.
>
>
> So I executed the following command to get the seqFile
>
>  bin/mahout seqdirectory -i /home/pedro/input -o /home/pedro/diffuse/output 
> -c UTF-8
>
> And I try to run my program whith the generated file. Getting the following 
> error:
>
> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
> org.apache.hadoop.io.IntWritable
>    at 
> org.apache.mahout.math.hadoop.TransposeJob$TransposeMapper.map(TransposeJob.java:100)
>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
>    at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:396)
>    at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>    at org.apache.hadoop.mapred.Child.main(Child.java:253)
>
> Do I have to change the input file to another format?
>
> Thanks.
>
>
>
>> Date: Sun, 4 Mar 2012 17:48:56 -0800
>> Subject: Re: DistributedRowMatrix - FileNotFoundException
>> From: goks...@gmail.com
>> To: user@mahout.apache.org
>>
>> This could be a problem with the DRM code or HDFS management. Try
>> running it without HDFS or a Hadoop cluster, with local files and in
>> pseudo-distributed mode. This way you can narrow the problem to one of
>> the above.
>>
>> On Sat, Mar 3, 2012 at 10:13 AM, PEDRO MANUEL JIMENEZ RODRIGUEZ
>> <pmjimenez1...@hotmail.com> wrote:
>> >
>> > Hi everyone!
>> >
>> > I'm trying to use DistributedRowMatrix in my class code but I'm getting 
>> > the same error all the time: "FileNotFoundException"
>> >
>> > I have put a file in my hdfs directory under /user/hduser/diffuse. And I 
>> > run the progam with "diffuse" as input and output directory. The code 
>> > looks like:
>> >
>> >  Configuration originalConfig = getConf();
>> >  DistributedRowMatrix matrix = new
>> > DistributedRowMatrix(inputPath,
>> >                                                               outputPath,
>> >                                                               numRows,
>> >                                                               numCols);
>> >
>> >                               JobConf conf = new JobConf(originalConfig);
>> >                               matrix.configure(conf);
>> >
>> >                               DistributedRowMatrix t1 = matrix.transpose();
>> >
>> > 12/03/03 18:55:13 WARN mapred.JobClient: Use GenericOptionsParser for 
>> > parsing the arguments. Applications should implement Tool for the same.
>> > 12/03/03 18:55:14 INFO mapred.FileInputFormat: Total input paths to 
>> > process : 7
>> > 12/03/03 18:55:14 INFO mapred.JobClient: Cleaning up the staging area 
>> > hdfs://localhost:54310/app/hadoop/tmp/mapred/staging/hduser/.staging/job_201203031751_0007
>> > Exception in thread "main" java.io.FileNotFoundException: File does not 
>> > exist: hdfs://localhost:54310/user/hduser/diffuse/7476429391099/data
>> >    at 
>> > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:525)
>> >    at 
>> > org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:51)
>> >    at 
>> > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:211)
>> >    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:929)
>> >    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:921)
>> >    at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
>> >    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:838)
>> >    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791)
>> >    at java.security.AccessController.doPrivileged(Native Method)
>> >    at javax.security.auth.Subject.doAs(Subject.java:396)
>> >    at 
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>> >    at 
>> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:791)
>> >    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:765)
>> >    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1200)
>> >    at 
>> > org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:159)
>> >    at Distributed.MatrixTransposeJob.run(MatrixTransposeJob.java:51)
>> >    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> >    at Distributed.MatrixTransposeJob.main(MatrixTransposeJob.java:58)
>> >    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >    at 
>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >    at 
>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >    at java.lang.reflect.Method.invoke(Method.java:597)
>> >    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>> >
>> > What I'm doing wrong?
>> >
>> > Everytime I try to run the code I'm obtaining a path like this one:
>> >
>> > FileNotFoundException: File does not exist: 
>> > hdfs://localhost:54310/user/hduser/diffuse/7476429391099/data
>> >
>> > Thanks a lot.
>> >
>> > Pedro.
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>

Re: DistributedRowMatrix - FileNotFoundException

Reply via email to