>From the stacktrace: FAILEDjava.lang.NumberFormatException: For input string: "A1234567" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
Obviously, the input's incorrect. On Wednesday, November 20, 2013 6:02 PM, Sameer Tilak <ssti...@live.com> wrote: Dear Sebastian,I tried using ItemSimilarityJob.My data has the following format Each line contains data in the format:userid itemid (I also tried userid, itemcode). Itemcode is a string. However, I am getting the following error. May be my input format is incorrect. ./mahout org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob --input testdata/similarityinput -o testdata/similarityoutput --similarityClassname SIMILARITY_COOCCURRENCE --maxSimilaritiesPerItem 10 13/11/20 14:46:39 WARN driver.MahoutDriver: No org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.props found on classpath, will use command-line arguments only13/11/20 14:46:39 INFO common.AbstractJob: Command line arguments: {--booleanData=[false], --endPhase=[2147483647], --input=[testdata/similarityinput], --maxPrefs=[500], --maxSimilaritiesPerItem=[10], --minPrefsPerUser=[1], --output=[testdata/similarityoutput], --similarityClassname=[SIMILARITY_COOCCURRENCE], --startPhase=[0], --tempDir=[temp]}13/11/20 14:46:39 INFO common.AbstractJob: Command line arguments: {--booleanData=[false], --endPhase=[2147483647], --input=[testdata/similarityinput], --minPrefsPerUser=[1], --output=[temp/prepareRatingMatrix], --ratingShift=[0.0], --startPhase=[0], --tempDir=[temp]}13/11/20 14:46:41 INFO input.FileInputFormat: Total input paths to process : 113/11/20 14:46:41 INFO util.NativeCodeLoader: Loaded the native-hadoop library13/11/20 14:46:41 WARN snappy.LoadSnappy: Snappy native library not loaded13/11/20 14:46:41 INFO mapred.JobClient: Running job: job_201311111627_011513/11/20 14:46:42 INFO mapred.JobClient: map 0% reduce 0%13/11/20 14:47:00 INFO mapred.JobClient: Task Id : attempt_201311111627_0115_m_000000_0, Status : FAILEDjava.lang.NumberFormatException: For input string: "A1234567" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50) at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) 13/11/20 14:47:11 INFO mapred.JobClient: Task Id : attempt_201311111627_0115_m_000000_1, Status : FAILEDjava.lang.NumberFormatException: For input string: "A1234567" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50) at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) > Date: Wed, 20 Nov 2013 08:22:07 +0100 > From: ssc.o...@googlemail.com > To: user@mahout.apache.org > Subject: Re: Mahout fpg > > You can use ItemSimilarityJob to find sets of items that cooccur > together in your users interactions. > > --sebastian > > > On 20.11.2013 08:11, Sameer Tilak wrote: > > > > > > > > Hi Sunil, > > Thanks for your reply. We can benefit a lot from the parallel frequent > > pattern matching functionality. Will there be any alternative in future > > releases? I guess, we can use older versions of Mahout if we need that. > > > >> Date: Tue, 19 Nov 2013 19:25:54 -0800 > >> From: suneel_mar...@yahoo.com > >> Subject: Re: Mahout fpg > >> To: user@mahout.apache.org > >> > >> Fpg has been removed from the codebase as it will not be supported. > >> > >> > >> > >> > >> > >> On Tuesday, November 19, 2013 8:56 PM, Sameer Tilak <ssti...@live.com> > >> wrote: > >> > >> Hi everyone,I downloaded the latest version of Mahout and did mvn install. > >> When I try to run fog, I get the following errors. Do I need to download > >> and compile FPG separately? Looks like somehow it has not been included in > >> the list of valid programs. > >> 13/11/19 17:49:19 WARN driver.MahoutDriver: Unable to add class: > >> fpg13/11/19 17:49:19 WARN driver.MahoutDriver: No fpg.props found on > >> classpath, will use command-line arguments onlyUnknown program 'fpg' > >> chosen.Valid program names are: arff.vector: : Generate Vectors from an > >> ARFF file or directory baumwelch: : Baum-Welch algorithm for unsupervised > >> HMM training canopy: : Canopy clustering cat: : Print a file or resource > >> as the logistic regression models would see it cleansvd: : Cleanup and > >> verification of SVD output clusterdump: : Dump cluster output to text > >> clusterpp: : Groups Clustering Output In Clusters cmdump: : Dump > >> confusion matrix in HTML or text formats concatmatrices: : Concatenates 2 > >> matrices of same cardinality into a single matrix cvb: : LDA via > >> Collapsed Variation Bayes (0th deriv. approx) cvb0_local: : LDA via > >> Collapsed Variation Bayes, in memory locally. evaluateFactorization: : > >> compute RMSE and MAE of a rating > >> matrix factorization against probes fkmeans: : Fuzzy K-means clustering > >>hmmpredict: : Generate random sequence of observations by given HMM > >>itemsimilarity: : Compute the item-item-similarities for item-based > >>collaborative filtering kmeans: : K-means clustering lucene.vector: : > >>Generate Vectors from a Lucene index lucene2seq: : Generate Text > >>SequenceFiles from a Lucene index matrixdump: : Dump matrix in CSV format > >>matrixmult: : Take the product of two matrices parallelALS: : ALS-WR > >>factorization of a rating matrix qualcluster: : Runs clustering > >>experiments and summarizes results in a CSV recommendfactorized: : Compute > >>recommendations using the factorization of a rating matrix > >>recommenditembased: : Compute recommendations using item-based > >>collaborative filtering regexconverter: : Convert text files on a per line > >>basis based on regular expressions resplit: : Splits a set of > >>SequenceFiles into a number of equal splits rowid: : > >> Map SequenceFile<Text,VectorWritable> to > >>{SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>} > >>rowsimilarity: : Compute the pairwise similarities of the rows of a matrix > >>runAdaptiveLogistic: : Score new production data using a probably trained > >>and validated AdaptivelogisticRegression model runlogistic: : Run a > >>logistic regression model against CSV data seq2encoded: : Encoded Sparse > >>Vector generation from Text sequence files seq2sparse: : Sparse Vector > >>generation from Text sequence files seqdirectory: : Generate sequence > >>files (of Text) from a directory seqdumper: : Generic Sequence File dumper > >> seqmailarchives: : Creates SequenceFile from a directory containing > >>gzipped mail archives seqwiki: : Wikipedia xml dump to sequence file > >>spectralkmeans: : Spectral k-means clustering split: : Split Input data > >>into test and train sets splitDataset: : split a rating dataset into > >>training and probe parts ssvd: : > >> Stochastic SVD streamingkmeans: : Streaming k-means clustering svd: : > >>Lanczos Singular Value Decomposition testnb: : Test the Vector-based Bayes > >>classifier trainAdaptiveLogistic: : Train an AdaptivelogisticRegression > >>model trainlogistic: : Train a logistic regression using stochastic > >>gradient descent trainnb: : Train the Vector-based Bayes classifier > >>transpose: : Take the transpose of a matrix validateAdaptiveLogistic: : > >>Validate an AdaptivelogisticRegression model against hold-out data set > >>vecdist: : Compute the distances between a set of Vectors (or Cluster or > >>Canopy, they must fit in memory) and a list of Vectors vectordump: : Dump > >>vectors from a sequence file to text viterbi: : Viterbi decoding of hidden > >>states from given output states sequence > > > > > > >