Re: SSVD error

Ted Dunning Sat, 01 Sep 2012 08:40:04 -0700

Regardless of confusion between k and p (I was confused as well) you still
can't set the sum to more than the minimum size of your data.  Here you
have set it larger.  And it breaks.


On Sat, Sep 1, 2012 at 11:09 AM, Pat Ferrel <pat.fer...@gmail.com> wrote:

> Oh, sorry, below I meant to say k (the number to reduce to) not p.
>
> In both cases p = 1, the first case k = 20, the second case k =100
>
> Also the first error does seem to be running with local hadoop. The error
> is from looking for a temp file that does exist in the file system, but not
> in the hadoop tmp based files.
>
>
> On Sep 1, 2012, at 7:53 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:
>
> With 57 crawled docs, you can't reasonably set p > 57.  That is your second
> error.
>
> On Sat, Sep 1, 2012 at 10:32 AM, Pat Ferrel <pat.fer...@gmail.com> wrote:
>
> > I have a small data set that I am using in local mode for debugging
> > purposes. The data is 57 crawled docs with something like 2200 terms. I
> run
> > this through seq2sparse, then my own cloned version of rowid to get a
> > distributed row matrix, then into SSVD. I realize this is not a
> production
> > environment, but you need to debug somewhere and single threaded
> execution
> > is ideal for debugging. As I said this works in hadoop clustered mode.
> >
> > The error looks like some code is expecting hdfs to be running, no? Here
> > is the exception stack from the ide with p = 20:
> >
> > 12/09/01 07:22:55 WARN mapred.LocalJobRunner: job_local_0002
> > java.io.FileNotFoundException: File
> >
> /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
> > does not exist.
> >        at
> >
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
> >        at
> >
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
> >        at
> >
> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
> >        at
> >
> org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
> >        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> >        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> >        at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> > Exception in thread "main" java.io.IOException: Bt job unsuccessful.
> >        at
> > org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
> >        at
> >
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
> >        at
> >
> com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
> >        at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
> >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> > Disconnected from the target VM, address: '127.0.0.1:54588', transport:
> > 'socket'
> >        at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
> >
> > Process finished with exit code 1
> >
> > With p=100-200 I get the following:
> >
> > 12/09/01 07:30:33 ERROR common.IOUtils: new m can't be less than n
> > java.lang.IllegalArgumentException: new m can't be less than n
> >        at
> >
> org.apache.mahout.math.hadoop.stochasticsvd.qr.GivensThinSolver.adjust(GivensThinSolver.java:109)
> >        at
> >
> org.apache.mahout.math.hadoop.stochasticsvd.qr.QRFirstStep.cleanup(QRFirstStep.java:233)
> >        at
> >
> org.apache.mahout.math.hadoop.stochasticsvd.qr.QRFirstStep.close(QRFirstStep.java:89)
> >        at org.apache.mahout.common.IOUtils.close(IOUtils.java:128)
> >        at
> >
> org.apache.mahout.math.hadoop.stochasticsvd.QJob$QMapper.cleanup(QJob.java:158)
> >        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
> >        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> >        at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> > 12/09/01 07:30:33 WARN mapred.LocalJobRunner: job_local_0001
> > java.lang.IllegalArgumentException: new m can't be less than n
> >        at
> >
> org.apache.mahout.math.hadoop.stochasticsvd.qr.GivensThinSolver.adjust(GivensThinSolver.java:109)
> >        at
> >
> org.apache.mahout.math.hadoop.stochasticsvd.qr.QRFirstStep.cleanup(QRFirstStep.java:233)
> >        at
> >
> org.apache.mahout.math.hadoop.stochasticsvd.qr.QRFirstStep.close(QRFirstStep.java:89)
> >        at org.apache.mahout.common.IOUtils.close(IOUtils.java:128)
> >        at
> >
> org.apache.mahout.math.hadoop.stochasticsvd.QJob$QMapper.cleanup(QJob.java:158)
> >        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
> >        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> >        at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> > Exception in thread "main" java.io.IOException: Q job unsuccessful.
> >        at
> > org.apache.mahout.math.hadoop.stochasticsvd.QJob.run(QJob.java:230)
> >        at
> >
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:376)
> >        at
> >
> com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
> >        at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
> >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> >        at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
> > Disconnected from the target VM, address: '127.0.0.1:54614', transport:
> > 'socket'
> >
> > Process finished with exit code 1
> >
> >
> >
> >
> > On Aug 31, 2012, at 4:21 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
> >
> > Perhaps if you give more info about the stack etc. i might get a
> > better idea though
> >
> > On Fri, Aug 31, 2012 at 4:19 PM, Dmitriy Lyubimov <dlie...@gmail.com>
> > wrote:
> >> I am not sure, i haven't used it that way.
> >>
> >> I know it works fully distributed AND when embedded with local job
> >> tracker (e.g. its tests are basically MR jobs with "local" job
> >> tracker) which probably is not the same as Mahout local mode.  "local"
> >> job tracker is not good for much though: thus it doesn't use even
> >> multicore parallelism as it doesn't support multiple reducers, so this
> >> code is kind of for a real cluster really, pragmatically. There's also
> >> Ted's implementation of non-distributed SSVD in Mahout which does not
> >> require Hadoop dependencies but it is a different api with no PCA
> >> option (not sure about power iterations).
> >>
> >> I am not sure why this very particular error appears in your setup.
> >>
> >> On Fri, Aug 31, 2012 at 3:02 PM, Pat Ferrel <pat.fer...@gmail.com>
> > wrote:
> >>> Running on the local file system inside IDEA with MAHOUT_LOCAL set and
> > performing an SSVD I get the error below. Notice that R-m-00000 exists in
> > the local file system and running it outside the debugger in
> pseudo-cluster
> > mode with HDFS works. Does SSVD work in local mode?
> >>>
> >>> java.io.FileNotFoundException: File
> >
> /tmp/hadoop-pat/mapred/local/archive/5543644668644532045_1587570556_2120541978/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
> > does not exist.
> >>>
> >>> Maclaurin:big-data pat$ ls -al b/ssvd/Q-job/
> >>> total 72
> >>> drwxr-xr-x  10 pat  staff   340 Aug 31 13:35 .
> >>> drwxr-xr-x   4 pat  staff   136 Aug 31 13:35 ..
> >>> -rw-r--r--   1 pat  staff    80 Aug 31 13:35 .QHat-m-00000.crc
> >>> -rw-r--r--   1 pat  staff    28 Aug 31 13:35 .R-m-00000.crc
> >>> -rw-r--r--   1 pat  staff     8 Aug 31 13:35 ._SUCCESS.crc
> >>> -rw-r--r--   1 pat  staff    12 Aug 31 13:35 .part-m-00000.deflate.crc
> >>> -rwxrwxrwx   1 pat  staff  9154 Aug 31 13:35 QHat-m-00000
> >>> -rwxrwxrwx   1 pat  staff  2061 Aug 31 13:35 R-m-00000
> >>> -rwxrwxrwx   1 pat  staff     0 Aug 31 13:35 _SUCCESS
> >>> -rwxrwxrwx   1 pat  staff     8 Aug 31 13:35 part-m-00000.deflate
> >>>
> >
> >
>
>

Re: SSVD error

Reply via email to