Dmitriy, I did downgrade my hadoop and got the same error; however your last suggestion worked, I moved the output path to a whole different directory and this particular problem went away.
Thanks Much, Peyman On Thu, Apr 5, 2012 at 12:38 PM, Dmitriy Lyubimov <[email protected]> wrote: > also i notice that you are using output as a subfolder of your input? > if so, it is probably going to create some mess. If so, please don't > use folders for input and output spec which are nested w.r.t. each > other. This is not expected. > > -d > > On Thu, Apr 5, 2012 at 12:00 PM, Peyman Mohajerian <[email protected]> > wrote: > > Ok, great, I'll give these ideas a try later today, the input is the > > following line(s) that in my code sample was commented out using ';' in > > Clojure. > > The first stage, Q-job is done fine, it is the second job that gets > messed > > up, the output of Q-job is at: > > /lsa4solr/matrix/14099700861483/transpose-213/SSVD-out/Q-job and > > /lsa4solr/matrix/14099700861483/transpose-213/SSVD-out/Q-job but BtJob is > > looking for the input in the wrong place, it must be hadoop version as > you > > said. > > > > input path #<Path > > hdfs://localhost:9000/lsa4solr/matrix/15835804941333/transpose-120> > > dd #<Path[] [Lorg.apache.hadoop.fs.Path;@5563d208> > > numCol 1000 > > numrow 15982 > > > > > > On Thu, Apr 5, 2012 at 11:54 AM, Dmitriy Lyubimov <[email protected]> > wrote: > > > >> Another idea i have is to try to run it from just Mahout command line, > >> see if it works with .205. If it does, it is definitely something > >> about passing parameters in/client hadoop classpath/ etc. > >> > >> On Thu, Apr 5, 2012 at 11:51 AM, Dmitriy Lyubimov <[email protected]> > >> wrote: > >> > also you are printing your input path -- how does it look like in > >> > reality? because this path that it complains about, SSVDOutput/data, > >> > in fact should be the input path. That's what's perplexing. > >> > > >> > We are talking hadoop job setup process here, nothing specific to the > >> > solution itself. And job setup/directory management fails for some > >> > reason. > >> > > >> > On Thu, Apr 5, 2012 at 11:45 AM, Dmitriy Lyubimov <[email protected]> > >> wrote: > >> >> Any chance you could test it with its current dependency, 0.20.204? > or > >> >> that would be hard to stage? > >> >> > >> >> Newer hadoop version is frankly all i can think of here for the > reason > >> of this. > >> >> > >> >> On Thu, Apr 5, 2012 at 11:35 AM, Peyman Mohajerian < > [email protected]> > >> wrote: > >> >>> Hi Dmitriy, > >> >>> > >> >>> It is a Clojure code from: https://github.com/algoriffic/lsa4solr > >> >>> Of course I modified it to use Mahout .6 distribution, also running > on > >> >>> hadoop-0.20.205.0, here is the Closure code that I changed, > >> >>> the lines after ' decomposer (doto (.run ssvdSolver)) ' still need > >> >>> modification b/c I'm not reading the eigenValue/Vector from the > solver > >> >>> correctly. Originally this code was based on Mahout .4. I'm > creating > >> the > >> >>> Matrix from Solr 3.1.0, very similar to what was done on: ' > >> >>> https://github.com/algoriffic/lsa4solr' > >> >>> > >> >>> Thanks, > >> >>> > >> >>> (defn decompose-svd > >> >>> [mat k] > >> >>> ;(println "input path " (.getRowPath mat)) > >> >>> ;(println "dd " (into-array [(.getRowPath mat)])) > >> >>> ;(println "numCol " (.numCols mat)) > >> >>> ;(println "numrow " (.numRows mat)) > >> >>> (let [eigenvalues (new java.util.ArrayList) > >> >>> eigenvectors (DenseMatrix. (+ k 2) (.numCols mat)) > >> >>> numCol (.numCols mat) > >> >>> config (.getConf mat) > >> >>> rawPath (.getRowPath mat) > >> >>> outputPath (Path. (str (.toString rawPath) "/SSVD-out")) > >> >>> inputPath (into-array [rawPath]) > >> >>> ssvdSolver (SSVDSolver. config inputPath outputPath 1000 k 60 3) > >> >>> decomposer (doto (.run ssvdSolver)) > >> >>> V (normalize-matrix-columns (.viewPart (.transpose eigenvectors) > >> >>> (int-array [0 0]) > >> >>> (int-array [(.numCols mat) k]))) > >> >>> U (mmult mat V) > >> >>> S (diag (take k (reverse eigenvalues)))] > >> >>> {:U U > >> >>> :S S > >> >>> :V V})) > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> On Thu, Apr 5, 2012 at 11:10 AM, Dmitriy Lyubimov < > [email protected]> > >> wrote: > >> >>> > >> >>>> Yeah. i don't see how it may have arrived at that error. > >> >>>> > >> >>>> > >> >>>> Peyman, > >> >>>> > >> >>>> I need to know more -- it looks like you are using embedded api, > not a > >> >>>> command line, so i need to see how you you initialize the solver > and > >> >>>> also which version of Mahout libraries you are using (your stack > trace > >> >>>> numbers do not correspond to anything reasonable on current trunk). > >> >>>> > >> >>>> thanks. > >> >>>> > >> >>>> -d > >> >>>> > >> >>>> On Thu, Apr 5, 2012 at 10:55 AM, Dmitriy Lyubimov < > [email protected]> > >> >>>> wrote: > >> >>>> > Hm. i never saw that and not sure where this folder comes from. > >> Which > >> >>>> > hadoop version are you using? This may be a result of > incompatible > >> >>>> > support for multiple outputs in the newer hadoop versions . I > tested > >> >>>> > it with CDH3u0/u3 and it was fine. This folder should normally > >> appear > >> >>>> > in the conversation, i suspect it is an internal hadoop thing. > >> >>>> > > >> >>>> > This is without me actually looking at the code per stack trace. > >> >>>> > > >> >>>> > > >> >>>> > On Thu, Apr 5, 2012 at 5:22 AM, Peyman Mohajerian < > >> [email protected]> > >> >>>> wrote: > >> >>>> >> Hi Guys, > >> >>>> >> I'm now using ssvd for my LSA code and get the following error, > at > >> the > >> >>>> time > >> >>>> >> of error all I have under 'SSVD-out' folder: > >> >>>> >> Q-job/QHat-m-00000< > >> >>>> > >> > http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2FQHat-m-00000&namenodeInfoPort=50070 > >> >>>> >& > >> >>>> >> R-m-00000< > >> >>>> > >> > http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2FR-m-00000&namenodeInfoPort=50070 > >> >>>> >& > >> >>>> >> _SUCCESS< > >> >>>> > >> > http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2F_SUCCESS&namenodeInfoPort=50070 > >> >>>> >& > >> >>>> >> part-m-00000.deflate< > >> >>>> > >> > http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2Fpart-m-00000.deflate&namenodeInfoPort=50070 > >> >>>> > > >> >>>> >> > >> >>>> >> I'm not clear where '/data' folder is supposed to be set, is it > >> part of > >> >>>> the > >> >>>> >> output of the QJob, I don't see any error in the QJob*? > >> >>>> >> > >> >>>> >> *Thanks,* > >> >>>> >> * > >> >>>> >> SEVERE: java.io.FileNotFoundException: File does not exist: > >> >>>> >> > >> >>>> > >> > hdfs://localhost:9000/lsa4solr/matrix/15835804941333/transpose-120/SSVD-out/data > >> >>>> >> at > >> >>>> >> > >> >>>> > >> > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:534) > >> >>>> >> at > >> >>>> >> > >> >>>> > >> > org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:63) > >> >>>> >> at > >> >>>> >> > >> >>>> > >> > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) > >> >>>> >> at > >> >>>> > org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:954) > >> >>>> >> at > >> org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:971) > >> >>>> >> at > >> org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172) > >> >>>> >> at > org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:889) > >> >>>> >> at > org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:842) > >> >>>> >> at java.security.AccessController.doPrivileged(Native Method) > >> >>>> >> at javax.security.auth.Subject.doAs(Subject.java:396) > >> >>>> >> at > >> >>>> >> > >> >>>> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > >> >>>> >> at > >> >>>> >> > >> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:842) > >> >>>> >> at org.apache.hadoop.mapreduce.Job.submit(Job.java:465) > >> >>>> >> at > >> >>>> > org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:505) > >> >>>> >> at > >> >>>> >> > >> >>>> > >> > org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:347) > >> >>>> >> at > >> lsa4solr.mahout_matrix$decompose_svd.invoke(mahout_matrix.clj:188) > >> >>>> >> at > >> >>>> >> > >> >>>> > >> > lsa4solr.clustering_protocol$decompose_term_doc_matrix.invoke(clustering_protocol.clj:125) > >> >>>> >> at > >> >>>> >> > >> >>>> > >> > lsa4solr.clustering_protocol$cluster_kmeans_docs.invoke(clustering_protocol.clj:142) > >> >>>> >> at lsa4solr.cluster$cluster_dispatch.invoke(cluster.clj:72) > >> >>>> >> at lsa4solr.cluster$_cluster.invoke(cluster.clj:103) > >> >>>> >> at lsa4solr.cluster.LSAClusteringEngine.cluster(Unknown > Source) > >> >>>> >> at > >> >>>> >> > >> >>>> > >> > org.apache.solr.handler.clustering.ClusteringComponent.process(ClusteringComponent.java:91) > >> >>>> >> at > >> >>>> >> > >> >>>> > >> > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) > >> >>>> >> at > >> >>>> >> > >> >>>> > >> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > >> >>>> >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) > >> >>>> >> at > >> >>>> >> > >> >>>> > >> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) > >> >>>> >> at > >> >>>> >> > >> >>>> > >> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) > >> >>>> >> at > >> >>>> >> > >> >>>> > >> > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > >> >>>> >> at > >> >>>> >> > >> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > >> >>>> >> > >> >>>> >> On Sun, Feb 26, 2012 at 4:56 PM, Dmitriy Lyubimov < > >> [email protected]> > >> >>>> wrote: > >> >>>> >> > >> >>>> >>> for the third time, in context of lsa, faster and hence perhaps > >> better > >> >>>> >>> alternative to lanczos is ssvd. Is there any specific reason > you > >> want > >> >>>> >>> to use lanczos solver in context of LSA? > >> >>>> >>> > >> >>>> >>> -d > >> >>>> >>> > >> >>>> >>> On Sun, Feb 26, 2012 at 6:40 AM, Peyman Mohajerian < > >> [email protected] > >> >>>> > > >> >>>> >>> wrote: > >> >>>> >>> > Hi Guys, > >> >>>> >>> > > >> >>>> >>> > Per you advice I did upgrade to Mahout .6 and did a bunch of > API > >> >>>> >>> > changes and in the meantime realized I had a bug with my > input > >> >>>> matrix, > >> >>>> >>> > zero rows read from Solr b/c multiple fields in Solr were > index > >> and > >> >>>> >>> > not just the one I was interested in, that issues is fixed > and > >> I have > >> >>>> >>> > a matrix with these dimensions: (.numCols mat) 1000 (.numRows > >> mat) > >> >>>> >>> > 15932 (or the transpose) > >> >>>> >>> > Unfortunately I'm getting the below error now, in the context > >> of some > >> >>>> >>> > other Mahout algorithm there was a mention of '/tmp' vs > '/_tmp' > >> >>>> >>> > causing this issue but in this particular case the matrix is > in > >> >>>> >>> > memory!! I'm using this google package: guava-r09.jar > >> >>>> >>> > > >> >>>> >>> > SEVERE: java.util.NoSuchElementException > >> >>>> >>> > at > >> >>>> >>> > >> >>>> > >> > com.google.common.collect.AbstractIterator.next(AbstractIterator.java:152) > >> >>>> >>> > at > >> >>>> >>> > >> >>>> > >> > org.apache.mahout.math.hadoop.TimesSquaredJob.retrieveTimesSquaredOutputVector(TimesSquaredJob.java:190) > >> >>>> >>> > at > >> >>>> >>> > >> >>>> > >> > org.apache.mahout.math.hadoop.DistributedRowMatrix.timesSquared(DistributedRowMatrix.java:238) > >> >>>> >>> > at > >> >>>> >>> > >> >>>> > >> > org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:104) > >> >>>> >>> > at > >> >>>> >>> > lsa4solr.mahout_matrix$decompose_svd.invoke(mahout_matrix.clj:165) > >> >>>> >>> > > >> >>>> >>> > > >> >>>> >>> > Any suggestion? > >> >>>> >>> > Thanks, > >> >>>> >>> > Peyman > >> >>>> >>> > > >> >>>> >>> > > >> >>>> >>> > > >> >>>> >>> > On Mon, Feb 20, 2012 at 10:38 AM, Dmitriy Lyubimov < > >> >>>> [email protected]> > >> >>>> >>> wrote: > >> >>>> >>> >> Peyman, > >> >>>> >>> >> > >> >>>> >>> >> > >> >>>> >>> >> Yes, what Ted said. Please take 0.6 release. Also try ssvd, > it > >> may > >> >>>> >>> >> benefit you in some regards compared to Lanczos. > >> >>>> >>> >> > >> >>>> >>> >> -d > >> >>>> >>> >> > >> >>>> >>> >> On Sun, Feb 19, 2012 at 10:34 AM, Peyman Mohajerian < > >> >>>> [email protected]> > >> >>>> >>> wrote: > >> >>>> >>> >>> Hi Dmitriy & Others, > >> >>>> >>> >>> > >> >>>> >>> >>> Dmitriy thanks for your previous response. > >> >>>> >>> >>> I have a follow up question to my LSA project. I have > managed > >> to > >> >>>> >>> >>> upload 1,500 documents from two different news groups (one > >> about > >> >>>> >>> >>> graphics and one about Atheism > >> >>>> >>> >>> http://people.csail.mit.edu/jrennie/20Newsgroups/) to > Solr. > >> >>>> However my > >> >>>> >>> >>> LanczosSolver in Mahout.4 does not find any eigenvalues > >> (there are > >> >>>> >>> >>> eigenvectors as you see in the follow up logs). > >> >>>> >>> >>> The only things I'm doing different from > >> >>>> >>> >>> (https://github.com/algoriffic/lsa4solr) is that I'm not > >> using the > >> >>>> >>> >>> 'Summary' field but rather the actual 'text' field in Solr. > >> I'm > >> >>>> >>> >>> assuming the issue is that Summary field already removes > the > >> noise > >> >>>> and > >> >>>> >>> >>> make the clustering work and the raw index data does not do > >> that, > >> >>>> am I > >> >>>> >>> >>> correct or there are other potential explanations? For the > >> desired > >> >>>> >>> >>> rank I'm using values between 10-100 and looking for > #clusters > >> >>>> between > >> >>>> >>> >>> 2-10 (different values for different trials), but always > the > >> same > >> >>>> >>> >>> result comes out, no clusters found. > >> >>>> >>> >>> If my issue is related to not having summarization done, > how > >> can > >> >>>> that > >> >>>> >>> >>> be done in Solr? I wasn't able to fine a Summary field in > >> Solr. > >> >>>> >>> >>> > >> >>>> >>> >>> Thanks > >> >>>> >>> >>> Peyman > >> >>>> >>> >>> > >> >>>> >>> >>> > >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM > >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver > solve > >> >>>> >>> >>> INFO: Lanczos iteration complete - now to diagonalize the > >> >>>> tri-diagonal > >> >>>> >>> >>> auxiliary matrix. > >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM > >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver > solve > >> >>>> >>> >>> INFO: Eigenvector 0 found with eigenvalue 0.0 > >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM > >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver > solve > >> >>>> >>> >>> INFO: Eigenvector 1 found with eigenvalue 0.0 > >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM > >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver > solve > >> >>>> >>> >>> INFO: Eigenvector 2 found with eigenvalue 0.0 > >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM > >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver > solve > >> >>>> >>> >>> INFO: Eigenvector 3 found with eigenvalue 0.0 > >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM > >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver > solve > >> >>>> >>> >>> INFO: Eigenvector 4 found with eigenvalue 0.0 > >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM > >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver > solve > >> >>>> >>> >>> INFO: Eigenvector 5 found with eigenvalue 0.0 > >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM > >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver > solve > >> >>>> >>> >>> INFO: Eigenvector 6 found with eigenvalue 0.0 > >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM > >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver > solve > >> >>>> >>> >>> INFO: Eigenvector 7 found with eigenvalue 0.0 > >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM > >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver > solve > >> >>>> >>> >>> INFO: Eigenvector 8 found with eigenvalue 0.0 > >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM > >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver > solve > >> >>>> >>> >>> INFO: Eigenvector 9 found with eigenvalue 0.0 > >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM > >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver > solve > >> >>>> >>> >>> INFO: Eigenvector 10 found with eigenvalue 0.0 > >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM > >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver > solve > >> >>>> >>> >>> INFO: LanczosSolver finished. > >> >>>> >>> >>> > >> >>>> >>> >>> > >> >>>> >>> >>> On Sun, Jan 1, 2012 at 10:06 PM, Dmitriy Lyubimov < > >> >>>> [email protected]> > >> >>>> >>> wrote: > >> >>>> >>> >>>> In Mahout lsa pipeline is possible with seqdirectory, > >> seq2sparse > >> >>>> and > >> >>>> >>> ssvd > >> >>>> >>> >>>> commands. Nuances are understanding dictionary format and > llr > >> >>>> >>> anaylysis of > >> >>>> >>> >>>> n-grams and perhaps use a slightly better lemmatizer than > the > >> >>>> default > >> >>>> >>> one. > >> >>>> >>> >>>> > >> >>>> >>> >>>> With indexing part you are on your own at this point. > >> >>>> >>> >>>> On Jan 1, 2012 2:28 PM, "Peyman Mohajerian" < > >> [email protected]> > >> >>>> >>> wrote: > >> >>>> >>> >>>> > >> >>>> >>> >>>>> Hi Guys, > >> >>>> >>> >>>>> > >> >>>> >>> >>>>> I'm interested in this work: > >> >>>> >>> >>>>> > >> >>>> >>> >>>>> > >> >>>> >>> > >> >>>> > >> > http://www.ccri.com/blog/2010/4/2/latent-semantic-analysis-in-solr-using-clojure.html > >> >>>> >>> >>>>> > >> >>>> >>> >>>>> I looked at some of the comments and notices that there > was > >> >>>> interest > >> >>>> >>> >>>>> in incorporating it into Mahout, back in 2010. I'm also > >> having > >> >>>> issues > >> >>>> >>> >>>>> running this code due to dependencies on older version of > >> Mahout. > >> >>>> >>> >>>>> > >> >>>> >>> >>>>> I was wondering if LSA is now directly available in > Mahout? > >> Also > >> >>>> if I > >> >>>> >>> >>>>> upgrade to the latest Mahout would this Clojure code > work? > >> >>>> >>> >>>>> > >> >>>> >>> >>>>> Thanks > >> >>>> >>> >>>>> Peyman > >> >>>> >>> >>>>> > >> >>>> >>> > >> >>>> > >> >
