also i notice that you are using output as a subfolder of your input?
if so, it is probably going to create some mess. If so, please don't
use folders for input and output spec which are nested w.r.t. each
other. This is not expected.


On Thu, Apr 5, 2012 at 12:00 PM, Peyman Mohajerian <> wrote:
> Ok, great, I'll give these ideas a try later today, the input is the
> following line(s) that in my code sample was commented out using ';' in
> Clojure.
>  The first stage, Q-job is done fine, it is the second job that gets messed
> up, the output of Q-job is at:
> /lsa4solr/matrix/14099700861483/transpose-213/SSVD-out/Q-job and
> /lsa4solr/matrix/14099700861483/transpose-213/SSVD-out/Q-job but BtJob is
> looking for the input in the wrong place, it must be hadoop version as you
> said.
> input path  #<Path
> hdfs://localhost:9000/lsa4solr/matrix/15835804941333/transpose-120>
> dd  #<Path[] [Lorg.apache.hadoop.fs.Path;@5563d208>
> numCol  1000
> numrow  15982
> On Thu, Apr 5, 2012 at 11:54 AM, Dmitriy Lyubimov <> wrote:
>> Another idea i have is to try to run it from just Mahout command line,
>> see if it works with .205. If it does, it is definitely something
>> about passing parameters in/client hadoop classpath/ etc.
>> On Thu, Apr 5, 2012 at 11:51 AM, Dmitriy Lyubimov <>
>> wrote:
>> > also you are printing your input path -- how does it look like in
>> > reality? because this path that it complains about, SSVDOutput/data,
>> > in fact should be the input path. That's what's perplexing.
>> >
>> > We are talking hadoop job setup process here, nothing specific to the
>> > solution itself. And job setup/directory management fails for some
>> > reason.
>> >
>> > On Thu, Apr 5, 2012 at 11:45 AM, Dmitriy Lyubimov <>
>> wrote:
>> >> Any chance you could test it with its current dependency, 0.20.204? or
>> >> that would be hard to stage?
>> >>
>> >> Newer hadoop version is frankly all i can think of here for the reason
>> of this.
>> >>
>> >> On Thu, Apr 5, 2012 at 11:35 AM, Peyman Mohajerian <>
>> wrote:
>> >>> Hi Dmitriy,
>> >>>
>> >>> It is a Clojure code from:
>> >>> Of course I modified it to use Mahout .6 distribution, also running on
>> >>> hadoop-, here is the Closure code that I changed,
>> >>> the lines after ' decomposer (doto (.run ssvdSolver)) ' still need
>> >>> modification b/c I'm not reading the eigenValue/Vector from the solver
>> >>> correctly.  Originally this code was based on Mahout .4. I'm creating
>> the
>> >>> Matrix from Solr 3.1.0, very similar to what was done on: '
>> >>>'
>> >>>
>> >>> Thanks,
>> >>>
>> >>> (defn decompose-svd
>> >>>  [mat k]
>> >>>  ;(println "input path " (.getRowPath mat))
>> >>>  ;(println "dd " (into-array [(.getRowPath mat)]))
>> >>>  ;(println "numCol " (.numCols mat))
>> >>>  ;(println "numrow " (.numRows mat))
>> >>>  (let [eigenvalues (new java.util.ArrayList)
>> >>>    eigenvectors (DenseMatrix. (+ k 2) (.numCols mat))
>> >>>    numCol (.numCols mat)
>> >>>        config (.getConf mat)
>> >>>    rawPath (.getRowPath mat)
>> >>>    outputPath (Path. (str (.toString rawPath) "/SSVD-out"))
>> >>>    inputPath (into-array [rawPath])
>> >>>    ssvdSolver (SSVDSolver. config inputPath outputPath 1000 k 60 3)
>> >>>    decomposer (doto (.run ssvdSolver))
>> >>>    V (normalize-matrix-columns (.viewPart (.transpose eigenvectors)
>> >>>                           (int-array [0 0])
>> >>>                           (int-array [(.numCols mat) k])))
>> >>>    U (mmult mat V)
>> >>>    S (diag (take k (reverse eigenvalues)))]
>> >>>    {:U U
>> >>>     :S S
>> >>>     :V V}))
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Thu, Apr 5, 2012 at 11:10 AM, Dmitriy Lyubimov <>
>> wrote:
>> >>>
>> >>>> Yeah. i don't see how it may have arrived at that error.
>> >>>>
>> >>>>
>> >>>> Peyman,
>> >>>>
>> >>>> I need to know more -- it looks like you are using embedded api, not a
>> >>>> command line, so i need to see how you you initialize the solver and
>> >>>> also which version of Mahout libraries you are using (your stack trace
>> >>>> numbers do not correspond to anything reasonable on current trunk).
>> >>>>
>> >>>> thanks.
>> >>>>
>> >>>> -d
>> >>>>
>> >>>> On Thu, Apr 5, 2012 at 10:55 AM, Dmitriy Lyubimov <>
>> >>>> wrote:
>> >>>> > Hm. i never saw that and not sure where this folder comes from.
>> Which
>> >>>> > hadoop version are you using? This may be a result of incompatible
>> >>>> > support for multiple outputs in the newer hadoop versions . I tested
>> >>>> > it with CDH3u0/u3 and it was fine. This folder should normally
>> appear
>> >>>> > in the conversation, i suspect it is an internal hadoop thing.
>> >>>> >
>> >>>> > This is without me actually looking at the code per stack trace.
>> >>>> >
>> >>>> >
>> >>>> > On Thu, Apr 5, 2012 at 5:22 AM, Peyman Mohajerian <
>> >>>> wrote:
>> >>>> >> Hi Guys,
>> >>>> >> I'm now using ssvd for my LSA code and get the following error, at
>> the
>> >>>> time
>> >>>> >> of error all I have under 'SSVD-out' folder:
>> >>>> >> Q-job/QHat-m-00000<
>> >>>>
>> http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2FQHat-m-00000&namenodeInfoPort=50070
>> >>>> >&
>> >>>> >> R-m-00000<
>> >>>>
>> http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2FR-m-00000&namenodeInfoPort=50070
>> >>>> >&
>> >>>> >> _SUCCESS<
>> >>>>
>> http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2F_SUCCESS&namenodeInfoPort=50070
>> >>>> >&
>> >>>> >> part-m-00000.deflate<
>> >>>>
>> http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2Fpart-m-00000.deflate&namenodeInfoPort=50070
>> >>>> >
>> >>>> >>
>> >>>> >> I'm not clear where '/data' folder is supposed to be set, is it
>> part of
>> >>>> the
>> >>>> >> output of the QJob, I don't see any error in the QJob*?
>> >>>> >>
>> >>>> >> *Thanks,*
>> >>>> >> *
>> >>>> >> SEVERE: File does not exist:
>> >>>> >>
>> >>>>
>> hdfs://localhost:9000/lsa4solr/matrix/15835804941333/transpose-120/SSVD-out/data
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(
>> >>>> >>    at
>> >>>> org.apache.hadoop.mapred.JobClient.writeNewSplits(
>> >>>> >>    at
>> org.apache.hadoop.mapred.JobClient.writeSplits(
>> >>>> >>    at
>> org.apache.hadoop.mapred.JobClient.access$600(
>> >>>> >>    at org.apache.hadoop.mapred.JobClient$
>> >>>> >>    at org.apache.hadoop.mapred.JobClient$
>> >>>> >>    at Method)
>> >>>> >>    at
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> >>>> >>    at
>> >>>> >>
>> org.apache.hadoop.mapred.JobClient.submitJobInternal(
>> >>>> >>    at org.apache.hadoop.mapreduce.Job.submit(
>> >>>> >>    at
>> >>>>
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> >>>> >>    at
>> lsa4solr.mahout_matrix$decompose_svd.invoke(mahout_matrix.clj:188)
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> lsa4solr.clustering_protocol$decompose_term_doc_matrix.invoke(clustering_protocol.clj:125)
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> lsa4solr.clustering_protocol$cluster_kmeans_docs.invoke(clustering_protocol.clj:142)
>> >>>> >>    at lsa4solr.cluster$cluster_dispatch.invoke(cluster.clj:72)
>> >>>> >>    at lsa4solr.cluster$_cluster.invoke(cluster.clj:103)
>> >>>> >>    at lsa4solr.cluster.LSAClusteringEngine.cluster(Unknown Source)
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> org.apache.solr.handler.clustering.ClusteringComponent.process(
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(
>> >>>> >>    at org.apache.solr.core.SolrCore.execute(
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> org.apache.solr.servlet.SolrDispatchFilter.execute(
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>> >>>> >>    at
>> >>>> >>
>> >>>>
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(
>> >>>> >>    at
>> >>>> >>
>> org.mortbay.jetty.servlet.ServletHandler.handle(
>> >>>> >>
>> >>>> >> On Sun, Feb 26, 2012 at 4:56 PM, Dmitriy Lyubimov <
>> >>>> wrote:
>> >>>> >>
>> >>>> >>> for the third time, in context of lsa, faster and hence perhaps
>> better
>> >>>> >>> alternative to lanczos is ssvd. Is there any specific reason you
>> want
>> >>>> >>> to use lanczos solver in context of LSA?
>> >>>> >>>
>> >>>> >>> -d
>> >>>> >>>
>> >>>> >>> On Sun, Feb 26, 2012 at 6:40 AM, Peyman Mohajerian <
>> >>>> >
>> >>>> >>> wrote:
>> >>>> >>> > Hi Guys,
>> >>>> >>> >
>> >>>> >>> > Per you advice I did upgrade to Mahout .6 and did a bunch of API
>> >>>> >>> > changes and in the meantime realized I had a bug with my input
>> >>>> matrix,
>> >>>> >>> > zero rows read from Solr b/c multiple fields in Solr were index
>> and
>> >>>> >>> > not just the one I was interested in, that issues is fixed and
>> I have
>> >>>> >>> > a matrix with these dimensions: (.numCols mat) 1000 (.numRows
>> mat)
>> >>>> >>> > 15932 (or the transpose)
>> >>>> >>> > Unfortunately I'm getting the below error now, in the context
>> of some
>> >>>> >>> > other Mahout algorithm there was a mention of '/tmp' vs '/_tmp'
>> >>>> >>> > causing this issue but in this particular case the matrix is in
>> >>>> >>> > memory!! I'm using this google package: guava-r09.jar
>> >>>> >>> >
>> >>>> >>> > SEVERE: java.util.NoSuchElementException
>> >>>> >>> >        at
>> >>>> >>>
>> >>>>
>> >>>> >>> >        at
>> >>>> >>>
>> >>>>
>> org.apache.mahout.math.hadoop.TimesSquaredJob.retrieveTimesSquaredOutputVector(
>> >>>> >>> >        at
>> >>>> >>>
>> >>>>
>> org.apache.mahout.math.hadoop.DistributedRowMatrix.timesSquared(
>> >>>> >>> >        at
>> >>>> >>>
>> >>>>
>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(
>> >>>> >>> >        at
>> >>>> >>> lsa4solr.mahout_matrix$decompose_svd.invoke(mahout_matrix.clj:165)
>> >>>> >>> >
>> >>>> >>> >
>> >>>> >>> > Any suggestion?
>> >>>> >>> > Thanks,
>> >>>> >>> > Peyman
>> >>>> >>> >
>> >>>> >>> >
>> >>>> >>> >
>> >>>> >>> > On Mon, Feb 20, 2012 at 10:38 AM, Dmitriy Lyubimov <
>> >>>>>
>> >>>> >>> wrote:
>> >>>> >>> >> Peyman,
>> >>>> >>> >>
>> >>>> >>> >>
>> >>>> >>> >> Yes, what Ted said. Please take 0.6 release. Also try ssvd, it
>> may
>> >>>> >>> >> benefit you in some regards compared to Lanczos.
>> >>>> >>> >>
>> >>>> >>> >> -d
>> >>>> >>> >>
>> >>>> >>> >> On Sun, Feb 19, 2012 at 10:34 AM, Peyman Mohajerian <
>> >>>>>
>> >>>> >>> wrote:
>> >>>> >>> >>> Hi Dmitriy & Others,
>> >>>> >>> >>>
>> >>>> >>> >>> Dmitriy thanks for your previous response.
>> >>>> >>> >>> I have a follow up question to my LSA project. I have managed
>> to
>> >>>> >>> >>> upload 1,500 documents from two different news groups (one
>> about
>> >>>> >>> >>> graphics and one about Atheism
>> >>>> >>> >>> to Solr.
>> >>>> However my
>> >>>> >>> >>> LanczosSolver in Mahout.4 does not find any eigenvalues
>> (there are
>> >>>> >>> >>> eigenvectors as you see in the follow up logs).
>> >>>> >>> >>> The only things I'm doing different from
>> >>>> >>> >>> ( is that I'm not
>> using the
>> >>>> >>> >>> 'Summary' field but rather the actual 'text' field in Solr.
>> I'm
>> >>>> >>> >>> assuming the issue is that Summary field already removes the
>> noise
>> >>>> and
>> >>>> >>> >>> make the clustering work and the raw index data does not do
>> that,
>> >>>> am I
>> >>>> >>> >>> correct or there are other potential explanations? For the
>> desired
>> >>>> >>> >>> rank I'm using values between 10-100 and looking for #clusters
>> >>>> between
>> >>>> >>> >>> 2-10 (different values for different trials), but always the
>> same
>> >>>> >>> >>> result comes out, no clusters found.
>> >>>> >>> >>> If my issue is related to not having summarization done, how
>> can
>> >>>> that
>> >>>> >>> >>> be done in Solr? I wasn't able to fine a Summary field in
>> Solr.
>> >>>> >>> >>>
>> >>>> >>> >>> Thanks
>> >>>> >>> >>> Peyman
>> >>>> >>> >>>
>> >>>> >>> >>>
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: Lanczos iteration complete - now to diagonalize the
>> >>>> tri-diagonal
>> >>>> >>> >>> auxiliary matrix.
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: Eigenvector 0 found with eigenvalue 0.0
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: Eigenvector 1 found with eigenvalue 0.0
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: Eigenvector 2 found with eigenvalue 0.0
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: Eigenvector 3 found with eigenvalue 0.0
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: Eigenvector 4 found with eigenvalue 0.0
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: Eigenvector 5 found with eigenvalue 0.0
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: Eigenvector 6 found with eigenvalue 0.0
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: Eigenvector 7 found with eigenvalue 0.0
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: Eigenvector 8 found with eigenvalue 0.0
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: Eigenvector 9 found with eigenvalue 0.0
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: Eigenvector 10 found with eigenvalue 0.0
>> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
>> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve
>> >>>> >>> >>> INFO: LanczosSolver finished.
>> >>>> >>> >>>
>> >>>> >>> >>>
>> >>>> >>> >>> On Sun, Jan 1, 2012 at 10:06 PM, Dmitriy Lyubimov <
>> >>>>>
>> >>>> >>> wrote:
>> >>>> >>> >>>> In Mahout lsa pipeline is possible with seqdirectory,
>> seq2sparse
>> >>>> and
>> >>>> >>> ssvd
>> >>>> >>> >>>> commands. Nuances are understanding dictionary format and llr
>> >>>> >>> anaylysis of
>> >>>> >>> >>>> n-grams and perhaps use a slightly better lemmatizer than the
>> >>>> default
>> >>>> >>> one.
>> >>>> >>> >>>>
>> >>>> >>> >>>> With indexing part you are on your own at this point.
>> >>>> >>> >>>> On Jan 1, 2012 2:28 PM, "Peyman Mohajerian" <
>> >>>> >>> wrote:
>> >>>> >>> >>>>
>> >>>> >>> >>>>> Hi Guys,
>> >>>> >>> >>>>>
>> >>>> >>> >>>>> I'm interested in this work:
>> >>>> >>> >>>>>
>> >>>> >>> >>>>>
>> >>>> >>>
>> >>>>
>> >>>> >>> >>>>>
>> >>>> >>> >>>>> I looked at some of the comments and notices that there was
>> >>>> interest
>> >>>> >>> >>>>> in incorporating it into Mahout, back in 2010. I'm also
>> having
>> >>>> issues
>> >>>> >>> >>>>> running this code due to dependencies on older version of
>> Mahout.
>> >>>> >>> >>>>>
>> >>>> >>> >>>>> I was wondering if LSA is now directly available in Mahout?
>> Also
>> >>>> if I
>> >>>> >>> >>>>> upgrade to the latest Mahout would this Clojure code work?
>> >>>> >>> >>>>>
>> >>>> >>> >>>>> Thanks
>> >>>> >>> >>>>> Peyman
>> >>>> >>> >>>>>
>> >>>> >>>
>> >>>>

Reply via email to