To reiterate the situation. In local mode using the local file system SSVD dies with a file not found. In pseudo-cluster mode using hdfs SSVD on the same data it runs correctly. All the rest of the analysis pipeline works fine in either mode. I am using local mode to debug my surrounding code.
From the error output it looks like the code is using the DistributedCache of hadoop. This is said to not work with local hadoop, though the comment was for a pre 0.20.205 version (my version). The implication being that when MAHOUT_LOCAL is set you shouldn't use the DistributedCache. Could this be the problem? stochasticSVD uses DistributedCache in several spots ========================== The /tmp file does not exist, the local java.io.FileNotFoundException: File /tmp/hadoop-pat/mapred/local/archive/5543644668644532045_1587570556_2120541978/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000 does not exist. Maclaurin:big-data pat$ ls -al b/ssvd/Q-job/ total 72 drwxr-xr-x 10 pat staff 340 Aug 31 13:35 . drwxr-xr-x 4 pat staff 136 Aug 31 13:35 .. -rw-r--r-- 1 pat staff 80 Aug 31 13:35 .QHat-m-00000.crc -rw-r--r-- 1 pat staff 28 Aug 31 13:35 .R-m-00000.crc -rw-r--r-- 1 pat staff 8 Aug 31 13:35 ._SUCCESS.crc -rw-r--r-- 1 pat staff 12 Aug 31 13:35 .part-m-00000.deflate.crc -rwxrwxrwx 1 pat staff 9154 Aug 31 13:35 QHat-m-00000 -rwxrwxrwx 1 pat staff 2061 Aug 31 13:35 R-m-00000 -rwxrwxrwx 1 pat staff 0 Aug 31 13:35 _SUCCESS -rwxrwxrwx 1 pat staff 8 Aug 31 13:35 part-m-00000.deflate