I noticed lots of algorithms implementations has deprecated in Mahout 0.8 and removed in 0.9, but no reasons or comments been marked. Can i ask why?
Btw, Mahout API is a little lack javadoc comments, every contributors of Mahout should has the responsibility to add more javadoc comments to the java file they created. On Fri, Nov 22, 2013 at 3:09 AM, Sameer Tilak <ssti...@live.com> wrote: > Sebastian,Thanks for the clarification. > > > Date: Thu, 21 Nov 2013 17:51:12 +0100 > > From: ssc.o...@googlemail.com > > To: user@mahout.apache.org > > Subject: Re: Mahout fpg > > > > ItemSimilarityJob does not handle alphanumeric identifiers. You have to > > preprocess your data before running that job. > > > > --sebastian > > > > On 21.11.2013 00:28, Sameer Tilak wrote: > > > Yes, changing A1234567 to 1234567 resolves that issue trivially. > However, (input: userid, itemcode) itemcode is alphanumeric and not just > numeric. I am sure ItemSimilarityJob will be able to handle that case, > however I need to know to supply the input correctly. I am currently using: > > > (userid, itemocde)(userid, itemocde)(userid, itemocde)(userid, > itemocde)…. > > > > > >> Date: Wed, 20 Nov 2013 15:11:49 -0800 > > >> From: suneel_mar...@yahoo.com > > >> Subject: Re: Mahout fpg > > >> To: user@mahout.apache.org > > >> > > >> From the stacktrace: > > >> > > >> FAILEDjava.lang.NumberFormatException: For input string: "A1234567" > > >> at > > >> > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > > >> > > >> Obviously, the input's incorrect. > > >> > > >> > > >> > > >> > > >> > > >> On Wednesday, November 20, 2013 6:02 PM, Sameer Tilak < > ssti...@live.com> wrote: > > >> > > >> Dear Sebastian,I tried using ItemSimilarityJob.My data has the > following format > > >> Each line contains data in the format:userid itemid (I also tried > userid, itemcode). Itemcode is a string. However, I am getting the > following error. May be my input format is incorrect. > > >> > > >> ./mahout > org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob --input > testdata/similarityinput -o testdata/similarityoutput --similarityClassname > SIMILARITY_COOCCURRENCE --maxSimilaritiesPerItem 10 13/11/20 14:46:39 > WARN driver.MahoutDriver: No > org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.props > found on classpath, will use command-line arguments only13/11/20 14:46:39 > INFO common.AbstractJob: Command line arguments: {--booleanData=[false], > --endPhase=[2147483647], --input=[testdata/similarityinput], > --maxPrefs=[500], --maxSimilaritiesPerItem=[10], --minPrefsPerUser=[1], > --output=[testdata/similarityoutput], > --similarityClassname=[SIMILARITY_COOCCURRENCE], --startPhase=[0], > --tempDir=[temp]}13/11/20 14:46:39 INFO common.AbstractJob: Command line > arguments: {--booleanData=[false], --endPhase=[2147483647], > --input=[testdata/similarityinput], --minPrefsPerUser=[1], > --output=[temp/prepareRatingMatrix], > > >> --ratingShift=[0.0], --startPhase=[0], --tempDir=[temp]}13/11/20 > 14:46:41 INFO input.FileInputFormat: Total input paths to process : > 113/11/20 14:46:41 INFO util.NativeCodeLoader: Loaded the native-hadoop > library13/11/20 14:46:41 WARN snappy.LoadSnappy: Snappy native library not > loaded13/11/20 14:46:41 INFO mapred.JobClient: Running job: > job_201311111627_011513/11/20 14:46:42 INFO mapred.JobClient: map 0% > reduce 0%13/11/20 14:47:00 INFO mapred.JobClient: Task Id : > attempt_201311111627_0115_m_000000_0, Status : > FAILEDjava.lang.NumberFormatException: For input string: "A1234567" at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:441) at > java.lang.Long.parseLong(Long.java:483) at > org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50) > at > > >> > > org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at > org.apache.hadoop.mapred.Child$4.run(Child.java:255) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:415) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > > >> 13/11/20 14:47:11 INFO mapred.JobClient: Task Id : > attempt_201311111627_0115_m_000000_1, Status : > FAILEDjava.lang.NumberFormatException: For input string: "A1234567" at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:441) at > java.lang.Long.parseLong(Long.java:483) at > org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50) > at > org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at > org.apache.hadoop.mapred.Child$4.run(Child.java:255) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:415) at > > >> > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > > >> > > >>> Date: Wed, 20 Nov 2013 08:22:07 +0100 > > >>> From: ssc.o...@googlemail.com > > >>> To: user@mahout.apache.org > > >>> Subject: Re: Mahout fpg > > >>> > > >>> You can use ItemSimilarityJob to find sets of items that cooccur > > >>> together in your users interactions. > > >>> > > >>> --sebastian > > >>> > > >>> > > >>> On 20.11.2013 08:11, Sameer Tilak wrote: > > >>>> > > >>>> > > >>>> > > >>>> Hi Sunil, > > >>>> Thanks for your reply. We can benefit a lot from the parallel > frequent pattern matching functionality. Will there be any alternative in > future releases? I guess, we can use older versions of Mahout if we need > that. > > >>>> > > >>>>> Date: Tue, 19 Nov 2013 19:25:54 -0800 > > >>>>> From: suneel_mar...@yahoo.com > > >>>>> Subject: Re: Mahout fpg > > >>>>> To: user@mahout.apache.org > > >>>>> > > >>>>> Fpg has been removed from the codebase as it will not be supported. > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> On Tuesday, November 19, 2013 8:56 PM, Sameer Tilak < > ssti...@live.com> wrote: > > >>>>> > > >>>>> Hi everyone,I downloaded the latest version of Mahout and did mvn > install. When I try to run fog, I get the following errors. Do I need to > download and compile FPG separately? Looks like somehow it has not been > included in the list of valid programs. > > >>>>> 13/11/19 17:49:19 WARN driver.MahoutDriver: Unable to add class: > fpg13/11/19 17:49:19 WARN driver.MahoutDriver: No fpg.props found on > classpath, will use command-line arguments onlyUnknown program 'fpg' > chosen.Valid program names are: arff.vector: : Generate Vectors from an > ARFF file or directory baumwelch: : Baum-Welch algorithm for unsupervised > HMM training canopy: : Canopy clustering cat: : Print a file or resource > as the logistic regression models would see it cleansvd: : Cleanup and > verification of SVD output clusterdump: : Dump cluster output to text > clusterpp: : Groups Clustering Output In Clusters cmdump: : Dump > confusion matrix in HTML or text formats concatmatrices: : Concatenates 2 > matrices of same cardinality into a single matrix cvb: : LDA via Collapsed > Variation Bayes (0th deriv. approx) cvb0_local: : LDA via Collapsed > Variation Bayes, in memory locally. evaluateFactorization: : compute RMSE > and MAE of a rating > > >>>>> matrix factorization against probes fkmeans: : Fuzzy K-means > clustering hmmpredict: : Generate random sequence of observations by given > HMM itemsimilarity: : Compute the item-item-similarities for item-based > collaborative filtering kmeans: : K-means clustering lucene.vector: : > Generate Vectors from a Lucene index lucene2seq: : Generate Text > SequenceFiles from a Lucene index matrixdump: : Dump matrix in CSV format > matrixmult: : Take the product of two matrices parallelALS: : ALS-WR > factorization of a rating matrix qualcluster: : Runs clustering > experiments and summarizes results in a CSV recommendfactorized: : Compute > recommendations using the factorization of a rating matrix > recommenditembased: : Compute recommendations using item-based > collaborative filtering regexconverter: : Convert text files on a per line > basis based on regular expressions resplit: : Splits a set of > SequenceFiles into a number of equal splits > > >> rowid: : > > >>>>> Map SequenceFile<Text,VectorWritable> to > {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>} > rowsimilarity: : Compute the pairwise similarities of the rows of a matrix > runAdaptiveLogistic: : Score new production data using a probably trained > and validated AdaptivelogisticRegression model runlogistic: : Run a > logistic regression model against CSV data seq2encoded: : Encoded Sparse > Vector generation from Text sequence files seq2sparse: : Sparse Vector > generation from Text sequence files seqdirectory: : Generate sequence > files (of Text) from a directory seqdumper: : Generic Sequence File dumper > seqmailarchives: : Creates SequenceFile from a directory containing > gzipped mail archives seqwiki: : Wikipedia xml dump to sequence file > spectralkmeans: : Spectral k-means clustering split: : Split Input data > into test and train sets splitDataset: : split a rating dataset into > training and probe parts ssvd: : > > >>>>> Stochastic SVD streamingkmeans: : Streaming k-means clustering > svd: : Lanczos Singular Value Decomposition testnb: : Test the > Vector-based Bayes classifier trainAdaptiveLogistic: : Train an > AdaptivelogisticRegression model trainlogistic: : Train a logistic > regression using stochastic gradient descent trainnb: : Train the > Vector-based Bayes classifier transpose: : Take the transpose of a matrix > validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model > against hold-out data set vecdist: : Compute the distances between a set > of Vectors (or Cluster or Canopy, they must fit in memory) and a list of > Vectors vectordump: : Dump vectors from a sequence file to text viterbi: > : Viterbi decoding of hidden states from given output states sequence > > >>>> > > >>>> > > >>>> > > >>> > > > > > > > > > >