In fact, Q-Job and Bt-Job have identical input ( of the A matrix) and
identical setup of such input but for some reason Bt-job fails to see
it. And it fails to see it in a very strange way. That's what

Bt job uses output of Q-job as a side info, not as main input. But the
error (split error ) comes from the main input which should be A.


On Thu, Apr 5, 2012 at 12:00 PM, Peyman Mohajerian <> wrote:
> Ok, great, I'll give these ideas a try later today, the input is the
> following line(s) that in my code sample was commented out using ';' in
> Clojure.
>  The first stage, Q-job is done fine, it is the second job that gets messed
> up, the output of Q-job is at:
> /lsa4solr/matrix/14099700861483/transpose-213/SSVD-out/Q-job and
> /lsa4solr/matrix/14099700861483/transpose-213/SSVD-out/Q-job but BtJob is
> looking for the input in the wrong place, it must be hadoop version as you
> said.
> input path  #<Path
> hdfs://localhost:9000/lsa4solr/matrix/15835804941333/transpose-120>
> dd  #<Path[] [Lorg.apache.hadoop.fs.Path;@5563d208>
> numCol  1000
> numrow  15982
> On Thu, Apr 5, 2012 at 11:54 AM, Dmitriy Lyubimov <> wrote:
>> Another idea i have is to try to run it from just Mahout command line,
>> see if it works with .205. If it does, it is definitely something
>> about passing parameters in/client hadoop classpath/ etc.
>> On Thu, Apr 5, 2012 at 11:51 AM, Dmitriy Lyubimov <>
>> wrote:
>> > also you are printing your input path -- how does it look like in
>> > reality? because this path that it complains about, SSVDOutput/data,
>> > in fact should be the input path. That's what's perplexing.
>> >
>> > We are talking hadoop job setup process here, nothing specific to the
>> > solution itself. And job setup/directory management fails for some
>> > reason.
>> >
>> > On Thu, Apr 5, 2012 at 11:45 AM, Dmitriy Lyubimov <>
>> wrote:
>> >> Any chance you could test it with its current dependency, 0.20.204? or
>> >> that would be hard to stage?
>> >>
>> >> Newer hadoop version is frankly all i can think of here for the reason
>> of this.
>> >>
>> >> On Thu, Apr 5, 2012 at 11:35 AM, Peyman Mohajerian <>
>> wrote:
>> >>> Hi Dmitriy,
>> >>>
>> >>> It is a Clojure code from:
>> >>> Of course I modified it to use Mahout .6 distribution, also running on
>> >>> hadoop-, here is the Closure code that I changed,
>> >>> the lines after ' decomposer (doto (.run ssvdSolver)) ' still need
>> >>> modification b/c I'm not reading the eigenValue/Vector from the solver
>> >>> correctly.  Originally this code was based on Mahout .4. I'm creating
>> the
>> >>> Matrix from Solr 3.1.0, very similar to what was done on: '
>> >>>'
>> >>>
>> >>> Thanks,
>> >>>
>> >>> (defn decompose-svd
>> >>>  [mat k]
>> >>>  ;(println "input path " (.getRowPath mat))
>> >>>  ;(println "dd " (into-array [(.getRowPath mat)]))
>> >>>  ;(println "numCol " (.numCols mat))
>> >>>  ;(println "numrow " (.numRows mat))
>> >>>  (let [eigenvalues (new java.util.ArrayList)
>> >>>    eigenvectors (DenseMatrix. (+ k 2) (.numCols mat))
>> >>>    numCol (.numCols mat)
>> >>>        config (.getConf mat)
>> >>>    rawPath (.getRowPath mat)
>> >>>    outputPath (Path. (str (.toString rawPath) "/SSVD-out"))
>> >>>    inputPath (into-array [rawPath])
>> >>>    ssvdSolver (SSVDSolver. config inputPath outputPath 1000 k 60 3)
>> >>>    decomposer (doto (.run ssvdSolver))
>> >>>    V (normalize-matrix-columns (.viewPart (.transpose eigenvectors)
>> >>>                           (int-array [0 0])
>> >>>                           (int-array [(.numCols mat) k])))
>> >>>    U (mmult mat V)
>> >>>    S (diag (take k (reverse eigenvalues)))]
>> >>>    {:U U
>> >>>     :S S
>> >>>     :V V}))
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Thu, Apr 5, 2012 at 11:10 AM, Dmitriy Lyubimov <>
>> wrote:
>> >>>
>> >>>> Yeah. i don't see how it may have arrived at that error.
>> >>>>
>> >>>>
>> >>>> Peyman,
>> >>>>
>> >>>> I need to know more -- it looks like you are using embedded api, not a
>> >>>> command line, so i need to see how you you initialize the solver and
>> >>>> also which version of Mahout libraries you are using (your stack trace
>> >>>> numbers do not correspond to anything reasonable on current trunk).
>> >>>>
>> >>>> thanks.
>> >>>>
>> >>>> -d
>> >>>>
>> >>>> On Thu, Apr 5, 2012 at 10:55 AM, Dmitriy Lyubimov <>
>> >>>> wrote:
>> >>>> > Hm. i never saw that and not sure where this folder comes from.
>> Which
>> >>>> > hadoop version are you using? This may be a result of incompatible
>> >>>> > support for multiple outputs in the newer hadoop versions . I tested
>> >>>> > it with CDH3u0/u3 and it was fine. This folder should normally
>> appear
>> >>>> > in the conversation, i suspect it is an internal hadoop thing.
>> >>>> >
>> >>>> > This is without me actually looking at the code per stack trace.
>> >>>> >
>> >>>> >
>> >>>> > On Thu, Apr 5, 2012 at 5:22 AM, Peyman Mohajerian <
>> >>>> wrote:
>> >>>> >> Hi Guys,
>> >>>> >> I'm now using ssvd for my LSA code and get the following error, at
>> the
>> >>>> time
>> >>>> >> of error all I have under 'SSVD-out' folder:
>> >>>> >> Q-job/QHat-m-00000<
>> >>>>
>> http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2FQHat-m-00000&namenodeInfoPort=50070
>> >>>> >&
>> >>>> >> R-m-00000<
>> >>>>
>> http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2FR-m-00000&namenodeInfoPort=50070
>> >>>> >&
>> >>>> >> _SUCCESS<
>> >>>>
>> http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2F_SUCCESS&namenodeInfoPort=50070
>> >>>> >&
>> >>>> >> part-m-00000.deflate<
>> >>>>
>> http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2Fpart-m-00000.deflate&namenodeInfoPort=50070
>> >>>> >
>> >>>> >>
>> >>>> >> I'm not clear where '/data' folder is supposed to be set, is it
>> part of
>> >>>> the
>> >>>> >> output of the QJob, I don't see any error in the QJob*?
>> >>>> >>
>> >>>> >> *Thanks,*
>> >>>> >> *
>> >>>> >> SEVERE: File does not exist:
>> >>>> >>
>> >>>>
>> hdfs://localhost:9000/lsa4solr/matrix/15835804941333/transpose-120/SSVD-out/data
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(
>> >>>> >>    at
>> >>>> org.apache.hadoop.mapred.JobClient.writeNewSplits(
>> >>>> >>    at
>> org.apache.hadoop.mapred.JobClient.writeSplits(
>> >>>> >>    at
>> org.apache.hadoop.mapred.JobClient.access$600(
>> >>>> >>    at org.apache.hadoop.mapred.JobClient$
>> >>>> >>    at org.apache.hadoop.mapred.JobClient$
>> >>>> >>    at Method)
>> >>>> >>    at
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> >>>> >>    at
>> >>>> >>
>> org.apache.hadoop.mapred.JobClient.submitJobInternal(
>> >>>> >>    at org.apache.hadoop.mapreduce.Job.submit(
>> >>>> >>    at
>> >>>>
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> >>>> >>    at
>> lsa4solr.mahout_matrix$decompose_svd.invoke(mahout_matrix.clj:188)
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> lsa4solr.clustering_protocol$decompose_term_doc_matrix.invoke(clustering_protocol.clj:125)
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> lsa4solr.clustering_protocol$cluster_kmeans_docs.invoke(clustering_protocol.clj:142)
>> >>>> >>    at lsa4solr.cluster$cluster_dispatch.invoke(cluster.clj:72)
>> >>>> >>    at lsa4solr.cluster$_cluster.invoke(cluster.clj:103)
>> >>>> >>    at lsa4solr.cluster.LSAClusteringEngine.cluster(Unknown Source)
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> org.apache.solr.handler.clustering.ClusteringComponent.process(
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(
>> >>>> >>    at org.apache.solr.core.SolrCore.execute(
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> org.apache.solr.servlet.SolrDispatchFilter.execute(
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(
>> >>>> >>    at
>> >>>> >>
>> org.mortbay.jetty.servlet.ServletHandler.handle(
>> >>>> >>
>> >>>> >> On Sun, Feb 26, 2012 at 4:56 PM, Dmitriy Lyubimov <
>> >>>> wrote:
>> >>>> >>
>> >>>> >>> for the third time, in context of lsa, faster and hence perhaps
>> better
>> >>>> >>> alternative to lanczos is ssvd. Is there any specific reason you
>> want
>> >>>> >>> to use lanczos solver in context of LSA?
>> >>>> >>>
>> >>>> >>> -d
>> >>>> >>>
>> >>>> >>> On Sun, Feb 26, 2012 at 6:40 AM, Peyman Mohajerian <
>> >>>> >
>> >>>> >>> wrote:
>> >>>> >>> > Hi Guys,
>> >>>> >>> >
>> >>>> >>> > Per you advice I did upgrade to Mahout .6 and did a bunch of API
>> >>>> >>> > changes and in the meantime realized I had a bug with my input
>> >>>> matrix,
>> >>>> >>> > zero rows read from Solr b/c multiple fields in Solr were index
>> and
>> >>>> >>> > not just the one I was interested in, that issues is fixed and
>> I have
>> >>>> >>> > a matrix with these dimensions: (.numCols mat) 1000 (.numRows
>> mat)
>> >>>> >>> > 15932 (or the transpose)
>> >>>> >>> > Unfortunately I'm getting the below error now, in the context
>> of some
>> >>>> >>> > other Mahout algorithm there was a mention of '/tmp' vs '/_tmp'
>> >>>> >>> > causing this issue but in this particular case the matrix is in
>> >>>> >>> > memory!! I'm using this google package: guava-r09.jar
>> >>>> >>> >
>> >>>> >>> > SEVERE: java.util.NoSuchElementException
>> >>>> >>> >        at
>> >>>> >>>
>> >>>>
>> >>>> >>> >        at
>> >>>> >>>
>> >>>>
>> org.apache.mahout.math.hadoop.TimesSquaredJob.retrieveTimesSquaredOutputVector(
>> >>>> >>> >        at
>> >>>> >>>
>> >>>>
>> org.apache.mahout.math.hadoop.DistributedRowMatrix.timesSquared(
>> >>>> >>> >        at
>> >>>> >>>
>> >>>>
>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(
>> >>>> >>> >        at
>> >>>> >>> lsa4solr.mahout_matrix$decompose_svd.invoke(mahout_matrix.clj:165)
>> >>>> >>> >
>> >>>> >>> >
>> >>>> >>> > Any suggestion?
>> >>>> >>> > Thanks,
>> >>>> >>> > Peyman
>> >>>> >>> >
>> >>>> >>> >
>> >>>> >>> >
>> >>>> >>> > On Mon, Feb 20, 2012 at 10:38 AM, Dmitriy Lyubimov <
>> >>>>>
>> >>>> >>> wrote:
>> >>>> >>> >> Peyman,
>> >>>> >>> >>
>> >>>> >>> >>
>> >>>> >>> >> Yes, what Ted said. Please take 0.6 release. Also try ssvd, it
>> may
>> >>>> >>> >> benefit you in some regards compared to Lanczos.
>> >>>> >>> >>
>> >>>> >>> >> -d
>> >>>> >>> >>
>> >>>> >>> >> On Sun, Feb 19, 2012 at 10:34 AM, Peyman Mohajerian <
>> >>>>>
>> >>>> >>> wrote:
>> >>>> >>> >>> Hi Dmitriy & Others,
>> >>>> >>> >>>
>> >>>> >>> >>> Dmitriy thanks for your previous response.
>> >>>> >>> >>> I have a follow up question to my LSA project. I have managed
>> to
>> >>>> >>> >>> upload 1,500 documents from two different news groups (one
>> about
>> >>>> >>> >>> graphics and one about Atheism
>> >>>> >>> >>> to Solr.
>> >>>> However my
>> >>>> >>> >>> LanczosSolver in Mahout.4 does not find any eigenvalues
>> (there are
>> >>>> >>> >>> eigenvectors as you see in the follow up logs).
>> >>>> >>> >>> The only things I'm doing different from
>> >>>> >>> >>> ( is that I'm not
>> using the
>> >>>> >>> >>> 'Summary' field but rather the actual 'text' field in Solr.
>> I'm
>> >>>> >>> >>> assuming the issue is that Summary field already removes the
>> noise
>> >>>> and
>> >>>> >>> >>> make the clustering work and the raw index data does not do
>> that,
>> >>>> am I
>> >>>> >>> >>> correct or there are other potential explanations? For the
>> desired
>> >>>> >>> >>> rank I'm using values between 10-100 and looking for #clusters
>> >>>> between
>> >>>> >>> >>> 2-10 (different values for different trials), but always the
>> same
>> >>>> >>> >>> result comes out, no clusters found.
>> >>>> >>> >>> If my issue is related to not having summarization done, how
>> can
>> >>>> that
>> >>>> >>> >>> be done in Solr? I wasn't able to fine a Summary field in
>> Solr.
>> >>>> >>> >>>
>> >>>> >>> >>> Thanks
>> >>>> >>> >>> Peyman
>> >>>> >>> >>>
>> >>>> >>> >>>
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: Lanczos iteration complete - now to diagonalize the
>> >>>> tri-diagonal
>> >>>> >>> >>> auxiliary matrix.
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: Eigenvector 0 found with eigenvalue 0.0
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: Eigenvector 1 found with eigenvalue 0.0
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: Eigenvector 2 found with eigenvalue 0.0
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: Eigenvector 3 found with eigenvalue 0.0
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: Eigenvector 4 found with eigenvalue 0.0
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: Eigenvector 5 found with eigenvalue 0.0
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: Eigenvector 6 found with eigenvalue 0.0
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: Eigenvector 7 found with eigenvalue 0.0
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: Eigenvector 8 found with eigenvalue 0.0
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: Eigenvector 9 found with eigenvalue 0.0
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: Eigenvector 10 found with eigenvalue 0.0
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: LanczosSolver finished.
>> >>>> >>> >>>
>> >>>> >>> >>>
>> >>>> >>> >>> On Sun, Jan 1, 2012 at 10:06 PM, Dmitriy Lyubimov <
>> >>>>>
>> >>>> >>> wrote:
>> >>>> >>> >>>> In Mahout lsa pipeline is possible with seqdirectory,
>> seq2sparse
>> >>>> and
>> >>>> >>> ssvd
>> >>>> >>> >>>> commands. Nuances are understanding dictionary format and llr
>> >>>> >>> anaylysis of
>> >>>> >>> >>>> n-grams and perhaps use a slightly better lemmatizer than the
>> >>>> default
>> >>>> >>> one.
>> >>>> >>> >>>>
>> >>>> >>> >>>> With indexing part you are on your own at this point.
>> >>>> >>> >>>> On Jan 1, 2012 2:28 PM, "Peyman Mohajerian" <
>> >>>> >>> wrote:
>> >>>> >>> >>>>
>> >>>> >>> >>>>> Hi Guys,
>> >>>> >>> >>>>>
>> >>>> >>> >>>>> I'm interested in this work:
>> >>>> >>> >>>>>
>> >>>> >>> >>>>>
>> >>>> >>>
>> >>>>
>> >>>> >>> >>>>>
>> >>>> >>> >>>>> I looked at some of the comments and notices that there was
>> >>>> interest
>> >>>> >>> >>>>> in incorporating it into Mahout, back in 2010. I'm also
>> having
>> >>>> issues
>> >>>> >>> >>>>> running this code due to dependencies on older version of
>> Mahout.
>> >>>> >>> >>>>>
>> >>>> >>> >>>>> I was wondering if LSA is now directly available in Mahout?
>> Also
>> >>>> if I
>> >>>> >>> >>>>> upgrade to the latest Mahout would this Clojure code work?
>> >>>> >>> >>>>>
>> >>>> >>> >>>>> Thanks
>> >>>> >>> >>>>> Peyman
>> >>>> >>> >>>>>
>> >>>> >>>
>> >>>>

Reply via email to