Sorry. The following must read > the topic. There's an eigenspokes *_paper_* which pretty much is devoted
On Mon, Jun 4, 2012 at 10:44 AM, Dmitriy Lyubimov <[email protected]> wrote: > RE: #2: I'd suggest to read LSA papers (Deerwester's, Dumais, they had > more than one of them) to see how they address efficacy analysis of > LSA there. > SSVD is nothing but an SVD method, Mahout SVD's accuracy analysis is > part of Nathan Halko's dissertation (linked to under "Papers" here: > https://cwiki.apache.org/confluence/display/MAHOUT/Stochastic+Singular+Value+Decomposition). > > RE:#1: I am not sure i read any work actually trying to figure > clusters on LSA outputs. Which may just mean i didn't read enough on > the topic. There's an eigenspokes value which pretty much is devoted > to sphere-projected clusters produced by SVD on the social data, but i > don't think they included LSA output in any of their claims. However, > you may want to check that paper out. LSA is more about > recall/precision/semantic distance hints (such as context-based > polisemy) rather than topic clustering. However, *i think,* if > there're any eigenspoke "clusters" in the LSA output, they better be > projected on the sphere first in order to detect them more clearly. > (see hyperspherical coordinates). I never did the latter so that's > just my guess. check out the papers for more info. > > -d > > > > On Mon, Jun 4, 2012 at 12:11 AM, Peyman Mohajerian <[email protected]> wrote: >> So now that LSA works but clustering of two newsgroups is not accurate >> based on my subjective observation. I had two questions: >> 1) Does it make sense to use Canopy before k-mean step to get a better idea >> of the number of clusters or the output from SSVD can help in that regard? >> Currently I pass the number of clusters as input parameter. >> 2) What is a good way to assess the accuracy of the result, is there some >> data set that is already clustered with certain tuning parameter that I can >> use to gain some confidence? Using Newsgroups of different topics may not >> be the best input since we aren't doing a regular clustering based on word >> count. >> >> Thanks >> Peyman >> >> On Fri, Apr 6, 2012 at 1:05 PM, Dmitriy Lyubimov <[email protected]> wrote: >> >>> Ok, cool. >>> >>> I think writing MR output into your input folder is not a good >>> practice in general in Hadoop world regardless of a job. Glad you had >>> it resolved. >>> >>> On Fri, Apr 6, 2012 at 9:55 AM, Peyman Mohajerian <[email protected]> >>> wrote: >>> > Dmitriy, >>> > >>> > I did downgrade my hadoop and got the same error; however your last >>> > suggestion worked, I moved the output path to a whole different directory >>> > and this particular problem went away. >>> > >>> > Thanks Much, >>> > Peyman >>> > >>> > On Thu, Apr 5, 2012 at 12:38 PM, Dmitriy Lyubimov <[email protected]> >>> wrote: >>> > >>> >> also i notice that you are using output as a subfolder of your input? >>> >> if so, it is probably going to create some mess. If so, please don't >>> >> use folders for input and output spec which are nested w.r.t. each >>> >> other. This is not expected. >>> >> >>> >> -d >>> >> >>> >> On Thu, Apr 5, 2012 at 12:00 PM, Peyman Mohajerian <[email protected]> >>> >> wrote: >>> >> > Ok, great, I'll give these ideas a try later today, the input is the >>> >> > following line(s) that in my code sample was commented out using ';' >>> in >>> >> > Clojure. >>> >> > The first stage, Q-job is done fine, it is the second job that gets >>> >> messed >>> >> > up, the output of Q-job is at: >>> >> > /lsa4solr/matrix/14099700861483/transpose-213/SSVD-out/Q-job and >>> >> > /lsa4solr/matrix/14099700861483/transpose-213/SSVD-out/Q-job but >>> BtJob is >>> >> > looking for the input in the wrong place, it must be hadoop version as >>> >> you >>> >> > said. >>> >> > >>> >> > input path #<Path >>> >> > hdfs://localhost:9000/lsa4solr/matrix/15835804941333/transpose-120> >>> >> > dd #<Path[] [Lorg.apache.hadoop.fs.Path;@5563d208> >>> >> > numCol 1000 >>> >> > numrow 15982 >>> >> > >>> >> > >>> >> > On Thu, Apr 5, 2012 at 11:54 AM, Dmitriy Lyubimov <[email protected]> >>> >> wrote: >>> >> > >>> >> >> Another idea i have is to try to run it from just Mahout command >>> line, >>> >> >> see if it works with .205. If it does, it is definitely something >>> >> >> about passing parameters in/client hadoop classpath/ etc. >>> >> >> >>> >> >> On Thu, Apr 5, 2012 at 11:51 AM, Dmitriy Lyubimov <[email protected] >>> > >>> >> >> wrote: >>> >> >> > also you are printing your input path -- how does it look like in >>> >> >> > reality? because this path that it complains about, >>> SSVDOutput/data, >>> >> >> > in fact should be the input path. That's what's perplexing. >>> >> >> > >>> >> >> > We are talking hadoop job setup process here, nothing specific to >>> the >>> >> >> > solution itself. And job setup/directory management fails for some >>> >> >> > reason. >>> >> >> > >>> >> >> > On Thu, Apr 5, 2012 at 11:45 AM, Dmitriy Lyubimov < >>> [email protected]> >>> >> >> wrote: >>> >> >> >> Any chance you could test it with its current dependency, >>> 0.20.204? >>> >> or >>> >> >> >> that would be hard to stage? >>> >> >> >> >>> >> >> >> Newer hadoop version is frankly all i can think of here for the >>> >> reason >>> >> >> of this. >>> >> >> >> >>> >> >> >> On Thu, Apr 5, 2012 at 11:35 AM, Peyman Mohajerian < >>> >> [email protected]> >>> >> >> wrote: >>> >> >> >>> Hi Dmitriy, >>> >> >> >>> >>> >> >> >>> It is a Clojure code from: >>> https://github.com/algoriffic/lsa4solr >>> >> >> >>> Of course I modified it to use Mahout .6 distribution, also >>> running >>> >> on >>> >> >> >>> hadoop-0.20.205.0, here is the Closure code that I changed, >>> >> >> >>> the lines after ' decomposer (doto (.run ssvdSolver)) ' still >>> need >>> >> >> >>> modification b/c I'm not reading the eigenValue/Vector from the >>> >> solver >>> >> >> >>> correctly. Originally this code was based on Mahout .4. I'm >>> >> creating >>> >> >> the >>> >> >> >>> Matrix from Solr 3.1.0, very similar to what was done on: ' >>> >> >> >>> https://github.com/algoriffic/lsa4solr' >>> >> >> >>> >>> >> >> >>> Thanks, >>> >> >> >>> >>> >> >> >>> (defn decompose-svd >>> >> >> >>> [mat k] >>> >> >> >>> ;(println "input path " (.getRowPath mat)) >>> >> >> >>> ;(println "dd " (into-array [(.getRowPath mat)])) >>> >> >> >>> ;(println "numCol " (.numCols mat)) >>> >> >> >>> ;(println "numrow " (.numRows mat)) >>> >> >> >>> (let [eigenvalues (new java.util.ArrayList) >>> >> >> >>> eigenvectors (DenseMatrix. (+ k 2) (.numCols mat)) >>> >> >> >>> numCol (.numCols mat) >>> >> >> >>> config (.getConf mat) >>> >> >> >>> rawPath (.getRowPath mat) >>> >> >> >>> outputPath (Path. (str (.toString rawPath) "/SSVD-out")) >>> >> >> >>> inputPath (into-array [rawPath]) >>> >> >> >>> ssvdSolver (SSVDSolver. config inputPath outputPath 1000 k 60 >>> 3) >>> >> >> >>> decomposer (doto (.run ssvdSolver)) >>> >> >> >>> V (normalize-matrix-columns (.viewPart (.transpose >>> eigenvectors) >>> >> >> >>> (int-array [0 0]) >>> >> >> >>> (int-array [(.numCols mat) k]))) >>> >> >> >>> U (mmult mat V) >>> >> >> >>> S (diag (take k (reverse eigenvalues)))] >>> >> >> >>> {:U U >>> >> >> >>> :S S >>> >> >> >>> :V V})) >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> On Thu, Apr 5, 2012 at 11:10 AM, Dmitriy Lyubimov < >>> >> [email protected]> >>> >> >> wrote: >>> >> >> >>> >>> >> >> >>>> Yeah. i don't see how it may have arrived at that error. >>> >> >> >>>> >>> >> >> >>>> >>> >> >> >>>> Peyman, >>> >> >> >>>> >>> >> >> >>>> I need to know more -- it looks like you are using embedded api, >>> >> not a >>> >> >> >>>> command line, so i need to see how you you initialize the solver >>> >> and >>> >> >> >>>> also which version of Mahout libraries you are using (your stack >>> >> trace >>> >> >> >>>> numbers do not correspond to anything reasonable on current >>> trunk). >>> >> >> >>>> >>> >> >> >>>> thanks. >>> >> >> >>>> >>> >> >> >>>> -d >>> >> >> >>>> >>> >> >> >>>> On Thu, Apr 5, 2012 at 10:55 AM, Dmitriy Lyubimov < >>> >> [email protected]> >>> >> >> >>>> wrote: >>> >> >> >>>> > Hm. i never saw that and not sure where this folder comes >>> from. >>> >> >> Which >>> >> >> >>>> > hadoop version are you using? This may be a result of >>> >> incompatible >>> >> >> >>>> > support for multiple outputs in the newer hadoop versions . I >>> >> tested >>> >> >> >>>> > it with CDH3u0/u3 and it was fine. This folder should normally >>> >> >> appear >>> >> >> >>>> > in the conversation, i suspect it is an internal hadoop thing. >>> >> >> >>>> > >>> >> >> >>>> > This is without me actually looking at the code per stack >>> trace. >>> >> >> >>>> > >>> >> >> >>>> > >>> >> >> >>>> > On Thu, Apr 5, 2012 at 5:22 AM, Peyman Mohajerian < >>> >> >> [email protected]> >>> >> >> >>>> wrote: >>> >> >> >>>> >> Hi Guys, >>> >> >> >>>> >> I'm now using ssvd for my LSA code and get the following >>> error, >>> >> at >>> >> >> the >>> >> >> >>>> time >>> >> >> >>>> >> of error all I have under 'SSVD-out' folder: >>> >> >> >>>> >> Q-job/QHat-m-00000< >>> >> >> >>>> >>> >> >> >>> >> >>> http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2FQHat-m-00000&namenodeInfoPort=50070 >>> >> >> >>>> >& >>> >> >> >>>> >> R-m-00000< >>> >> >> >>>> >>> >> >> >>> >> >>> http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2FR-m-00000&namenodeInfoPort=50070 >>> >> >> >>>> >& >>> >> >> >>>> >> _SUCCESS< >>> >> >> >>>> >>> >> >> >>> >> >>> http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2F_SUCCESS&namenodeInfoPort=50070 >>> >> >> >>>> >& >>> >> >> >>>> >> part-m-00000.deflate< >>> >> >> >>>> >>> >> >> >>> >> >>> http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2Fpart-m-00000.deflate&namenodeInfoPort=50070 >>> >> >> >>>> > >>> >> >> >>>> >> >>> >> >> >>>> >> I'm not clear where '/data' folder is supposed to be set, is >>> it >>> >> >> part of >>> >> >> >>>> the >>> >> >> >>>> >> output of the QJob, I don't see any error in the QJob*? >>> >> >> >>>> >> >>> >> >> >>>> >> *Thanks,* >>> >> >> >>>> >> * >>> >> >> >>>> >> SEVERE: java.io.FileNotFoundException: File does not exist: >>> >> >> >>>> >> >>> >> >> >>>> >>> >> >> >>> >> >>> hdfs://localhost:9000/lsa4solr/matrix/15835804941333/transpose-120/SSVD-out/data >>> >> >> >>>> >> at >>> >> >> >>>> >> >>> >> >> >>>> >>> >> >> >>> >> >>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:534) >>> >> >> >>>> >> at >>> >> >> >>>> >> >>> >> >> >>>> >>> >> >> >>> >> >>> org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:63) >>> >> >> >>>> >> at >>> >> >> >>>> >> >>> >> >> >>>> >>> >> >> >>> >> >>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) >>> >> >> >>>> >> at >>> >> >> >>>> >>> >> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:954) >>> >> >> >>>> >> at >>> >> >> org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:971) >>> >> >> >>>> >> at >>> >> >> org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172) >>> >> >> >>>> >> at >>> >> org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:889) >>> >> >> >>>> >> at >>> >> org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:842) >>> >> >> >>>> >> at java.security.AccessController.doPrivileged(Native >>> Method) >>> >> >> >>>> >> at javax.security.auth.Subject.doAs(Subject.java:396) >>> >> >> >>>> >> at >>> >> >> >>>> >> >>> >> >> >>>> >>> >> >> >>> >> >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) >>> >> >> >>>> >> at >>> >> >> >>>> >> >>> >> >> >>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:842) >>> >> >> >>>> >> at org.apache.hadoop.mapreduce.Job.submit(Job.java:465) >>> >> >> >>>> >> at >>> >> >> >>>> >>> >> org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:505) >>> >> >> >>>> >> at >>> >> >> >>>> >> >>> >> >> >>>> >>> >> >> >>> >> >>> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:347) >>> >> >> >>>> >> at >>> >> >> lsa4solr.mahout_matrix$decompose_svd.invoke(mahout_matrix.clj:188) >>> >> >> >>>> >> at >>> >> >> >>>> >> >>> >> >> >>>> >>> >> >> >>> >> >>> lsa4solr.clustering_protocol$decompose_term_doc_matrix.invoke(clustering_protocol.clj:125) >>> >> >> >>>> >> at >>> >> >> >>>> >> >>> >> >> >>>> >>> >> >> >>> >> >>> lsa4solr.clustering_protocol$cluster_kmeans_docs.invoke(clustering_protocol.clj:142) >>> >> >> >>>> >> at >>> lsa4solr.cluster$cluster_dispatch.invoke(cluster.clj:72) >>> >> >> >>>> >> at lsa4solr.cluster$_cluster.invoke(cluster.clj:103) >>> >> >> >>>> >> at lsa4solr.cluster.LSAClusteringEngine.cluster(Unknown >>> >> Source) >>> >> >> >>>> >> at >>> >> >> >>>> >> >>> >> >> >>>> >>> >> >> >>> >> >>> org.apache.solr.handler.clustering.ClusteringComponent.process(ClusteringComponent.java:91) >>> >> >> >>>> >> at >>> >> >> >>>> >> >>> >> >> >>>> >>> >> >> >>> >> >>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) >>> >> >> >>>> >> at >>> >> >> >>>> >> >>> >> >> >>>> >>> >> >> >>> >> >>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) >>> >> >> >>>> >> at >>> org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) >>> >> >> >>>> >> at >>> >> >> >>>> >> >>> >> >> >>>> >>> >> >> >>> >> >>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) >>> >> >> >>>> >> at >>> >> >> >>>> >> >>> >> >> >>>> >>> >> >> >>> >> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) >>> >> >> >>>> >> at >>> >> >> >>>> >> >>> >> >> >>>> >>> >> >> >>> >> >>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) >>> >> >> >>>> >> at >>> >> >> >>>> >> >>> >> >> >>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) >>> >> >> >>>> >> >>> >> >> >>>> >> On Sun, Feb 26, 2012 at 4:56 PM, Dmitriy Lyubimov < >>> >> >> [email protected]> >>> >> >> >>>> wrote: >>> >> >> >>>> >> >>> >> >> >>>> >>> for the third time, in context of lsa, faster and hence >>> perhaps >>> >> >> better >>> >> >> >>>> >>> alternative to lanczos is ssvd. Is there any specific reason >>> >> you >>> >> >> want >>> >> >> >>>> >>> to use lanczos solver in context of LSA? >>> >> >> >>>> >>> >>> >> >> >>>> >>> -d >>> >> >> >>>> >>> >>> >> >> >>>> >>> On Sun, Feb 26, 2012 at 6:40 AM, Peyman Mohajerian < >>> >> >> [email protected] >>> >> >> >>>> > >>> >> >> >>>> >>> wrote: >>> >> >> >>>> >>> > Hi Guys, >>> >> >> >>>> >>> > >>> >> >> >>>> >>> > Per you advice I did upgrade to Mahout .6 and did a bunch >>> of >>> >> API >>> >> >> >>>> >>> > changes and in the meantime realized I had a bug with my >>> >> input >>> >> >> >>>> matrix, >>> >> >> >>>> >>> > zero rows read from Solr b/c multiple fields in Solr were >>> >> index >>> >> >> and >>> >> >> >>>> >>> > not just the one I was interested in, that issues is fixed >>> >> and >>> >> >> I have >>> >> >> >>>> >>> > a matrix with these dimensions: (.numCols mat) 1000 >>> (.numRows >>> >> >> mat) >>> >> >> >>>> >>> > 15932 (or the transpose) >>> >> >> >>>> >>> > Unfortunately I'm getting the below error now, in the >>> context >>> >> >> of some >>> >> >> >>>> >>> > other Mahout algorithm there was a mention of '/tmp' vs >>> >> '/_tmp' >>> >> >> >>>> >>> > causing this issue but in this particular case the matrix >>> is >>> >> in >>> >> >> >>>> >>> > memory!! I'm using this google package: guava-r09.jar >>> >> >> >>>> >>> > >>> >> >> >>>> >>> > SEVERE: java.util.NoSuchElementException >>> >> >> >>>> >>> > at >>> >> >> >>>> >>> >>> >> >> >>>> >>> >> >> >>> >> >>> com.google.common.collect.AbstractIterator.next(AbstractIterator.java:152) >>> >> >> >>>> >>> > at >>> >> >> >>>> >>> >>> >> >> >>>> >>> >> >> >>> >> >>> org.apache.mahout.math.hadoop.TimesSquaredJob.retrieveTimesSquaredOutputVector(TimesSquaredJob.java:190) >>> >> >> >>>> >>> > at >>> >> >> >>>> >>> >>> >> >> >>>> >>> >> >> >>> >> >>> org.apache.mahout.math.hadoop.DistributedRowMatrix.timesSquared(DistributedRowMatrix.java:238) >>> >> >> >>>> >>> > at >>> >> >> >>>> >>> >>> >> >> >>>> >>> >> >> >>> >> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:104) >>> >> >> >>>> >>> > at >>> >> >> >>>> >>> >>> >> lsa4solr.mahout_matrix$decompose_svd.invoke(mahout_matrix.clj:165) >>> >> >> >>>> >>> > >>> >> >> >>>> >>> > >>> >> >> >>>> >>> > Any suggestion? >>> >> >> >>>> >>> > Thanks, >>> >> >> >>>> >>> > Peyman >>> >> >> >>>> >>> > >>> >> >> >>>> >>> > >>> >> >> >>>> >>> > >>> >> >> >>>> >>> > On Mon, Feb 20, 2012 at 10:38 AM, Dmitriy Lyubimov < >>> >> >> >>>> [email protected]> >>> >> >> >>>> >>> wrote: >>> >> >> >>>> >>> >> Peyman, >>> >> >> >>>> >>> >> >>> >> >> >>>> >>> >> >>> >> >> >>>> >>> >> Yes, what Ted said. Please take 0.6 release. Also try >>> ssvd, >>> >> it >>> >> >> may >>> >> >> >>>> >>> >> benefit you in some regards compared to Lanczos. >>> >> >> >>>> >>> >> >>> >> >> >>>> >>> >> -d >>> >> >> >>>> >>> >> >>> >> >> >>>> >>> >> On Sun, Feb 19, 2012 at 10:34 AM, Peyman Mohajerian < >>> >> >> >>>> [email protected]> >>> >> >> >>>> >>> wrote: >>> >> >> >>>> >>> >>> Hi Dmitriy & Others, >>> >> >> >>>> >>> >>> >>> >> >> >>>> >>> >>> Dmitriy thanks for your previous response. >>> >> >> >>>> >>> >>> I have a follow up question to my LSA project. I have >>> >> managed >>> >> >> to >>> >> >> >>>> >>> >>> upload 1,500 documents from two different news groups >>> (one >>> >> >> about >>> >> >> >>>> >>> >>> graphics and one about Atheism >>> >> >> >>>> >>> >>> http://people.csail.mit.edu/jrennie/20Newsgroups/) to >>> >> Solr. >>> >> >> >>>> However my >>> >> >> >>>> >>> >>> LanczosSolver in Mahout.4 does not find any eigenvalues >>> >> >> (there are >>> >> >> >>>> >>> >>> eigenvectors as you see in the follow up logs). >>> >> >> >>>> >>> >>> The only things I'm doing different from >>> >> >> >>>> >>> >>> (https://github.com/algoriffic/lsa4solr) is that I'm >>> not >>> >> >> using the >>> >> >> >>>> >>> >>> 'Summary' field but rather the actual 'text' field in >>> Solr. >>> >> >> I'm >>> >> >> >>>> >>> >>> assuming the issue is that Summary field already removes >>> >> the >>> >> >> noise >>> >> >> >>>> and >>> >> >> >>>> >>> >>> make the clustering work and the raw index data does >>> not do >>> >> >> that, >>> >> >> >>>> am I >>> >> >> >>>> >>> >>> correct or there are other potential explanations? For >>> the >>> >> >> desired >>> >> >> >>>> >>> >>> rank I'm using values between 10-100 and looking for >>> >> #clusters >>> >> >> >>>> between >>> >> >> >>>> >>> >>> 2-10 (different values for different trials), but always >>> >> the >>> >> >> same >>> >> >> >>>> >>> >>> result comes out, no clusters found. >>> >> >> >>>> >>> >>> If my issue is related to not having summarization done, >>> >> how >>> >> >> can >>> >> >> >>>> that >>> >> >> >>>> >>> >>> be done in Solr? I wasn't able to fine a Summary field >>> in >>> >> >> Solr. >>> >> >> >>>> >>> >>> >>> >> >> >>>> >>> >>> Thanks >>> >> >> >>>> >>> >>> Peyman >>> >> >> >>>> >>> >>> >>> >> >> >>>> >>> >>> >>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM >>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver >>> >> solve >>> >> >> >>>> >>> >>> INFO: Lanczos iteration complete - now to diagonalize >>> the >>> >> >> >>>> tri-diagonal >>> >> >> >>>> >>> >>> auxiliary matrix. >>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM >>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver >>> >> solve >>> >> >> >>>> >>> >>> INFO: Eigenvector 0 found with eigenvalue 0.0 >>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM >>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver >>> >> solve >>> >> >> >>>> >>> >>> INFO: Eigenvector 1 found with eigenvalue 0.0 >>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM >>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver >>> >> solve >>> >> >> >>>> >>> >>> INFO: Eigenvector 2 found with eigenvalue 0.0 >>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM >>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver >>> >> solve >>> >> >> >>>> >>> >>> INFO: Eigenvector 3 found with eigenvalue 0.0 >>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM >>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver >>> >> solve >>> >> >> >>>> >>> >>> INFO: Eigenvector 4 found with eigenvalue 0.0 >>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM >>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver >>> >> solve >>> >> >> >>>> >>> >>> INFO: Eigenvector 5 found with eigenvalue 0.0 >>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM >>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver >>> >> solve >>> >> >> >>>> >>> >>> INFO: Eigenvector 6 found with eigenvalue 0.0 >>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM >>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver >>> >> solve >>> >> >> >>>> >>> >>> INFO: Eigenvector 7 found with eigenvalue 0.0 >>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM >>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver >>> >> solve >>> >> >> >>>> >>> >>> INFO: Eigenvector 8 found with eigenvalue 0.0 >>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM >>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver >>> >> solve >>> >> >> >>>> >>> >>> INFO: Eigenvector 9 found with eigenvalue 0.0 >>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM >>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver >>> >> solve >>> >> >> >>>> >>> >>> INFO: Eigenvector 10 found with eigenvalue 0.0 >>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM >>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver >>> >> solve >>> >> >> >>>> >>> >>> INFO: LanczosSolver finished. >>> >> >> >>>> >>> >>> >>> >> >> >>>> >>> >>> >>> >> >> >>>> >>> >>> On Sun, Jan 1, 2012 at 10:06 PM, Dmitriy Lyubimov < >>> >> >> >>>> [email protected]> >>> >> >> >>>> >>> wrote: >>> >> >> >>>> >>> >>>> In Mahout lsa pipeline is possible with seqdirectory, >>> >> >> seq2sparse >>> >> >> >>>> and >>> >> >> >>>> >>> ssvd >>> >> >> >>>> >>> >>>> commands. Nuances are understanding dictionary format >>> and >>> >> llr >>> >> >> >>>> >>> anaylysis of >>> >> >> >>>> >>> >>>> n-grams and perhaps use a slightly better lemmatizer >>> than >>> >> the >>> >> >> >>>> default >>> >> >> >>>> >>> one. >>> >> >> >>>> >>> >>>> >>> >> >> >>>> >>> >>>> With indexing part you are on your own at this point. >>> >> >> >>>> >>> >>>> On Jan 1, 2012 2:28 PM, "Peyman Mohajerian" < >>> >> >> [email protected]> >>> >> >> >>>> >>> wrote: >>> >> >> >>>> >>> >>>> >>> >> >> >>>> >>> >>>>> Hi Guys, >>> >> >> >>>> >>> >>>>> >>> >> >> >>>> >>> >>>>> I'm interested in this work: >>> >> >> >>>> >>> >>>>> >>> >> >> >>>> >>> >>>>> >>> >> >> >>>> >>> >>> >> >> >>>> >>> >> >> >>> >> >>> http://www.ccri.com/blog/2010/4/2/latent-semantic-analysis-in-solr-using-clojure.html >>> >> >> >>>> >>> >>>>> >>> >> >> >>>> >>> >>>>> I looked at some of the comments and notices that >>> there >>> >> was >>> >> >> >>>> interest >>> >> >> >>>> >>> >>>>> in incorporating it into Mahout, back in 2010. I'm >>> also >>> >> >> having >>> >> >> >>>> issues >>> >> >> >>>> >>> >>>>> running this code due to dependencies on older >>> version of >>> >> >> Mahout. >>> >> >> >>>> >>> >>>>> >>> >> >> >>>> >>> >>>>> I was wondering if LSA is now directly available in >>> >> Mahout? >>> >> >> Also >>> >> >> >>>> if I >>> >> >> >>>> >>> >>>>> upgrade to the latest Mahout would this Clojure code >>> >> work? >>> >> >> >>>> >>> >>>>> >>> >> >> >>>> >>> >>>>> Thanks >>> >> >> >>>> >>> >>>>> Peyman >>> >> >> >>>> >>> >>>>> >>> >> >> >>>> >>> >>> >> >> >>>> >>> >> >> >>> >> >>>
